Probability, Statistics, 
and Random Processes 
for Electrical Engineering 


2 ae 
Peer ai a 


Alberto Leon-Garcia 


Probability, Statistics, 
and Random Processes 
for Electrical Engineering 
Third Edition 


Alberto Leon-Garcia 


University of Toronto 


PEARSON 
ed 
Prentice 
Hall 


Upper Saddle River, NJ 07458 


Library of Congress Cataloging-in-Publication Data 


Leon-Garcia, Alberto. 

Probability, statistics, and random processes for electrical engineering / Alberto Leon-Garcia. -- 3rd ed. 

p.cm. 

Includes bibliographical references and index. 

ISBN-13: 978-0-13-147122-1 (alk. paper) 

1. Electric engineering--Mathematics. 2. Probabilities. 3. Stochastic processes. I. Leon-Garcia, Alberto. Probability 
and random processes for electrical engineering. II. Title. 

TK153.L425 2007 

519.202'46213--dce22 

2007046492 


Vice President and Editorial Director, ECS: Marcia J. Horton 
Associate Editor: Alice Dworkin 
Editorial Assistant: William Opaluch 
Senior Managing Editor: Scott Disanno 
Production Editor: Craig Little 

Art Director: Jayen Conte 

Cover Designer: Bruce Kenselaar 

Art Editor: Greg Dulles 
Manufacturing Manager: Alan Fischer 
Manufacturing Buyer: Lisa McDowell 
Marketing Manager: Tim Galligan 


PEARSON © 2008 Pearson Education, Inc. 
ee eae Pearson Prentice Hall 
Prentice Pearson Education, Inc. 
Hall Upper Saddle River, NJ 07458 


All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in 
writing from the publisher. 


Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. MATLAB is a registered trademark of The Math 
Works, Inc. All other product or brand names are trademarks or registered trademarks of their respective holders. 


The author and publisher of this book have used their best efforts in preparing this book. These efforts include the 
development, research, and testing of the theories and programs to determine their effectiveness. The author and 
publisher make no warranty of any kind, expressed or implied, with regard to the material contained in this book. The 
author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or 
arising out of, the furnishing, performance, or use of this material. 


Printed in the United States of America 
10 9 8 765 43 2 1 


ISBN 0-13-147l22-8 
q78-0-13-1471l22-1 


Pearson Education Ltd., London 

Pearson Education Australia Pty. Ltd., Sydney 
Pearson Education Singapore, Pte. Ltd. 

Pearson Education North Asia Ltd., Hong Kong 
Pearson Education Canada, Inc., Toronto 

Pearson Educación de Mexico, S.A. de C.V. 
Pearson Education— Japan, Tokyo 

Pearson Education Malaysia, Pte. Ltd. 

Pearson Education, Upper Saddle River, New Jersey 


To KAREN, CARLOS, MARISA, AND MICHAEL. 


This page intentionally left blank 


Contents 


Preface ix 


CHAPTER 1 Probability Models in Electrical 
and Computer Engineering 1 


1.1 Mathematical Models as Tools in Analysis and Design 2 
1.2 Deterministic Models 4 
1.3 Probability Models 4 
1.4 A Detailed Example: A Packet Voice Transmission System 9 
1.5 Other Examples 11 
1.6 Overview of Book 16 
Summary 17 
Problems 18 


CHAPTER2 Basic Concepts of Probability Theory 21 


2.1 Specifying Random Experiments 21 
2.2 The Axioms of Probability 30 
*2.3 Computing Probabilities Using Counting Methods 41 
2.4 Conditional Probability 47 
2.5 Independence of Events 53 
2.6 Sequential Experiments 59 


*2.7 Synthesizing Randomness: Random Number Generators 67 

*2.8 Fine Points: Event Classes 70 

*2.9 Fine Points: Probabilities of Sequences of Events 75 
Summary 79 


Problems 80 


CHAPTER 3 Discrete Random Variables 96 


3.1 The Notion of a Random Variable 96 
3.2 Discrete Random Variables and Probability Mass Function 99 
3.3 Expected Value and Moments of Discrete Random Variable 104 
3.4 Conditional Probability Mass Function 111 
3.5 Important Discrete Random Variables 115 
3.6 Generation of Discrete Random Variables 127 

Summary 129 

Problems 130 


vi 


Contents 


CHAPTER 4 


4.1 
4.2 
4.3 
4.4 
4.5 
4.6 
4.7 
4.8 
4.9 
*4.10 


CHAPTER 5 


5.1 
5.2 
5.3 
5.4 
5.5 
5.6 


5.7 
5.8 
5.9 
5.10 


CHAPTER 6 


6.1 
6.2 
6.3 
6.4 
6.5 
6.6 


One Random Variable 141 


The Cumulative Distribution Function 141 
The Probability Density Function 148 
The Expected Value of X 155 


Important Continuous Random Variables 163 
Functions of a Random Variable 174 
The Markov and Chebyshev Inequalities 181 


Transform Methods 184 

Basic Reliability Calculations 189 

Computer Methods for Generating Random Variables 194 
Entropy 202 

Summary 213 

Problems 215 


Pairs of Random Variables 233 


Two Random Variables 233 

Pairs of Discrete Random Variables 236 

The Joint cdf of X and Y 242 

The Joint pdf of Two Continuous Random Variables 248 
Independence of Two Random Variables 254 


Joint Moments and Expected Values of a Function of Two Random 
Variables 257 


Conditional Probability and Conditional Expectation 261 


Functions of Two Random Variables 271 
Pairs of Jointly Gaussian Random Variables 278 
Generating Independent Gaussian Random Variables 284 


Summary 286 
Problems 288 


Vector Random Variables 303 


Vector Random Variables 303 

Functions of Several Random Variables 309 
Expected Values of Vector Random Variables 318 
Jointly Gaussian Random Vectors 325 

Estimation of Random Variables 332 

Generating Correlated Vector Random Variables 342 
Summary 346 

Problems 348 


CHAPTER 7 


7.1 
7.2 


7.3 


*7:4 
#75 
7.6 


CHAPTER 8 


8.1 
8.2 
8.3 
8.4 
8.5 
8.6 
8.7 


CHAPTER 9 


9.1 
9.2 
9.3 


9.4 
9.5 


9.6 
9.7 
9.8 
*9.9 
9.10 


Contents vii 


Sums of Random Variables and Long-Term Averages 359 


Sums of Random Variables 360 

The Sample Mean and the Laws of Large Numbers 365 

Weak Law of Large Numbers 367 

Strong Law of Large Numbers 368 

The Central Limit Theorem 369 

Central Limit Theorem 370 

Convergence of Sequences of Random Variables 378 

Long-Term Arrival Rates and Associated Averages 387 

Calculating Distribution’s Using the Discrete Fourier 
Transform 392 

Summary 400 

Problems 402 


Statistics 411 
Samples and Sampling Distributions 411 
Parameter Estimation 415 
Maximum Likelihood Estimation 419 
Confidence Intervals 430 


Hypothesis Testing 441 

Bayesian Decision Methods 455 

Testing the Fit of a Distribution to Data 462 
Summary 469 

Problems 471 


Random Processes 487 


Definition of a Random Process 488 
Specifying a Random Process 491 
Discrete-Time Processes: Sum Process, Binomial Counting Process, 
and Random Walk 498 
Poisson and Associated Random Processes 507 
Gaussian Random Processes, Wiener Process 
and Brownian Motion 514 
Stationary Random Processes 518 
Continuity, Derivatives, and Integrals of Random Processes 529 
Time Averages of Random Processes and Ergodic Theorems 540 
Fourier Series and Karhunen-Loeve Expansion 544 
Generating Random Processes 550 
Summary 554 
Problems 557 


viii Contents 


CHAPTER 10 = Analysis and Processing of Random Signals 577 


10.1 
10.2 
10.3 
10.4 
*10.5 
*10.6 
10.7 


CHAPTER 11 


11.1 
11.2 
11.3 


11.4 
EES 
11.6 


Power Spectral Density 577 

Response of Linear Systems to Random Signals 587 
Bandlimited Random Processes 597 

Optimum Linear Systems 605 

The Kalman Filter 617 

Estimating the Power Spectral Density 622 

Numerical Techniques for Processing Random Signals 628 
Summary 633 

Problems 635 


Markov Chains 647 


Markov Processes 647 

Discrete-Time Markov Chains 650 

Classes of States, Recurrence Properties, and Limiting 
Probabilities 660 

Continuous-Time Markov Chains 673 

Time-Reversed Markov Chains 686 

Numerical Techniques for Markov Chains 692 

Summary 700 

Problems 702 


CHAPTER 12 Introduction to Queueing Theory 713 


12.1 
12.2 
12.3 
12.4 
12.5 
12.6 
12.7 
12.8 
12.9 
12.10 


Appendices 


The Elements of a Queueing System 714 

Little’s Formula 715 

The M/M/1 Queue 718 

Multi-Server Systems: M/M/c, M/M/c/c, And M/M/¢ 727 
Finite-Source Queueing Systems 734 

M/G/1 Queueing Systems 738 

M/G/1 Analysis Using Embedded Markov Chains 745 
Burke’s Theorem: Departures From M/M/c Systems 754 
Networks of Queues: Jackson’s Theorem 758 
Simulation and Data Analysis of Queueing Systems 771 
Summary 782 

Problems 784 


Mathematical Tables 797 
Tables of Fourier Transforms 800 
Matrices and Linear Algebra 802 


Index 805 


Preface 


This book provides a carefully motivated, accessible, and interesting introduction to 
probability, statistics, and random processes for electrical and computer engineers. The 
complexity of the systems encountered in engineering practice calls for an understand- 
ing of probability concepts and a facility in the use of probability tools. The goal of the 
introductory course should therefore be to teach both the basic theoretical concepts 
and techniques for solving problems that arise in practice. The third edition of this 
book achieves this goal by retaining the proven features of previous editions: 


e Relevance to engineering practice 

e Clear and accessible introduction to probability 

e Computer exercises to develop intuition for randomness 
e Large number and variety of problems 

e Curriculum flexibility through rich choice of topics 

e Careful development of random process concepts. 


This edition also introduces two major new features: 


e Introduction to statistics 
e Extensive use of MATLAB®/Octave. 


RELEVANCE TO ENGINEERING PRACTICE 


Motivating students is a major challenge in introductory probability courses. Instructors 
need to respond by showing students the relevance of probability theory to engineering 
practice. Chapter 1 addresses this challenge by discussing the role of probability models 
in engineering design. Practical current applications from various areas of electrical and 
computer engineering are used to show how averages and relative frequencies provide 
the proper tools for handling the design of systems that involve randomness. These ap- 
plication areas include wireless and digital communications, digital media and signal 
processing, system reliability, computer networks, and Web systems. These areas are 
used in examples and problems throughout the text. 


ACCESSIBLE INTRODUCTION TO PROBABILITY THEORY 


Probability theory is an inherently mathematical subject so concepts must be presented 
carefully, simply, and gradually. The axioms of probability and their corollaries are devel- 
oped in a clear and deliberate manner. The model-building aspect is introduced through 
the assignment of probability laws to discrete and continuous sample spaces. The notion 
of a single discrete random variable is developed in its entirety, allowing the student to 
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focus on the basic probability concepts without analytical complications. Similarly, pairs 
of random variables and vector random variables are discussed in separate chapters. 

The most important random variables and random processes are developed in 
systematic fashion using model-building arguments. For example, a systematic devel- 
opment of concepts can be traced across every chapter from the initial discussions on 
coin tossing and Bernoulli trials, through the Gaussian random variable, central limit 
theorem, and confidence intervals in the middle chapters, and on to the Wiener process 
and the analysis of simulation data at the end of the book. The goal is to teach the stu- 
dent not only the fundamental concepts and methods of probability, but to also devel- 
op an awareness of the key models and their interrelationships. 


COMPUTER EXERCISES TO DEVELOP INTUITION FOR RANDOMNESS 


A true understanding of probability requires developing an intuition for variability 
and randomness. The development of an intuition for randomness can be aided by the 
presentation and analysis of random data. Where applicable, important concepts are 
motivated and reinforced using empirical data. Every chapter introduces one or more 
numerical or simulation techniques that enable the student to apply and validate the 
concepts. Topics covered include: Generation of random numbers, random variables, 
and random vectors; linear transformations and application of FFT; application of sta- 
tistical tests; simulation of random processes, Markov chains, and queueing models; sta- 
tistical signal processing; and analysis of simulation data. 

The sections on computer methods are optional. However, we have found that 
computer generated data is very effective in motivating each new topic and that the 
computer methods can be incorporated into existing lectures. The computer exercises 
can be done using MATLAB or Octave. We opted to use Octave in the examples be- 
cause it is sufficient to perform our exercises and it is free and readily available on the 
Web. Students with access can use MATLAB instead. 


STATISTICS TO LINK PROBABILITY MODELS TO THE REAL WORLD 


Statistics plays the key role of bridging probability models to the real world, and for this 
reason there is a trend in introductory undergraduate probability courses to include an 
introduction to statistics. This edition includes a new chapter that covers all the main 
topics in an introduction to statistics: Sampling distributions, parameter estimation, 
maximum likelihood estimation, confidence intervals, hypothesis testing, Bayesian deci- 
sion methods and goodness of fit tests. The foundation of random variables from earlier 
chapters allows us to develop statistical methods in a rigorous manner rather than pre- 
sent them in “cookbook” fashion. In this chapter MATLAB/Octave prove extremely 
useful in the generation of random data and the application of statistical methods. 


EXAMPLES AND PROBLEMS 


Numerous examples in every section are used to demonstrate analytical and problem- 
solving techniques, develop concepts using simplified cases, and illustrate applications. 
The text includes 1200 problems, nearly double the number in the previous edition. A 
large number of new problems involve the use of MATLAB or Octave to obtain 
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numerical or simulation results. Problems are identified by section to help the instruc- 
tor select homework problems. Additional problems requiring cumulative knowledge 
are provided at the end of each chapter. Answers to selected problems are included in 
the book website. A Student Solutions Manual accompanies this text to develop prob- 
lem-solving skills. A sampling of 25% of carefully worked out problems has been se- 
lected to help students understand concepts presented in the text. An Instructor 
Solutions Manual with complete solutions is also available on the book website. 


http://www.prenhall.com/leongarcia 


FROM RANDOM VARIABLES TO RANDOM PROCESSES 


Discrete-time random processes provide a crucial “bridge” in going from random vari- 
ables to continuous-time random processes. Care is taken in the first seven chapters to 
lay the proper groundwork for this transition. Thus sequences of dependent experiments 
are discussed in Chapter 2 as a preview of Markov chains. In Chapter 6, emphasis is 
placed on how a joint distribution generates a consistent family of marginal distributions. 
Chapter 7 introduces sequences of independent identically distributed (iid) random vari- 
ables. Chapter 8 uses the sum of an iid sequence to develop important examples of ran- 
dom processes. 

The traditional introductory course in random processes has focused on applica- 
tions from linear systems and random signal analysis. However, many courses now also 
include an introduction to Markov chains and some examples from queueing theory. 
We provide sufficient material in both topic areas to give the instructor leeway in strik- 
ing a balance between these two areas. Here we continue our systematic development 
of related concepts. Thus, the development of random signal analysis includes a discus- 
sion of the sampling theorem which is used to relate discrete-time signal processing to 
continuous-time signal processing. In a similar vein, the embedded chain formulation 
of continuous-time Markov chains is emphasized and later used to develop simulation 
models for continuous-time queueing systems. 


FLEXIBILITY THROUGH RICH CHOICE OF TOPICS 


The textbook is designed to allow the instructor maximum flexibility in the selection of 
topics. In addition to the standard topics taught in introductory courses on probability, 
random variables, statistics and random processes, the book includes sections on mod- 
eling, computer simulation, reliability, estimation and entropy, as well as chapters that 
provide introductions to Markov chains and queueing theory. 


SUGGESTED SYLLABI 


A variety of syllabi for undergraduate and graduate courses are supported by the text. 
The flow chart below shows the basic chapter dependencies, and the table of contents 
provides a detailed description of the sections in each chapter. 

The first five chapters (without the starred or optional sections) form the basis for 
a one-semester undergraduate introduction to probability. A course on probability and 
statistics would proceed from Chapter 5 to the first three sections of Chapter 7 and then 
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1. Probability Models 1. Review Chapters 1-5 

2. Basic Concepts 2.8 “Event Classes 

3. Discrete Random Variables 2.9 “Borel Fields 

4. Continuous Random Variables 3.1 “Random Variable 

5. Pairs of Random Variables 4.1 “Limiting Properties of CDF 


6. Vector Random Variables 6. Vector Random Variables 
7. Sums of Random Variables 7. Sums of Random Variables 


7.4 Sequences of Random Variables 


8. Statistics 9. Random Processes 9. Random Processes 


10. Analysis & Processing 11. Markov Chains 
of Random Signals 


12. Queueing Theory 


to Chapter 8. A first course on probability with a brief introduction to random processes 
would go from Chapter 5 to Sections 6.1, 7.1 — 7.3, and then the first few sections in Chap- 
ter 9, as time allows. Many other syllabi are possible using the various optional sections. 

A first-level graduate course in random processes would begin with a quick re- 
view of the axioms of probability and the notion of a random variable, including the 
starred sections on event classes (2.8), Borel fields and continuity of probability (2.9), 
the formal definition of a random variable (3.1), and the limiting properties of the cdf 
(4.1). The material in Chapter 6 on vector random variables, their joint distributions, 
and their transformations would be covered next. The discussion in Chapter 7 would 
include the central limit theorem and convergence concepts. The course would then 
cover Chapters 9, 10, and 11. A statistical signal processing emphasis can be given to 
the course by including the sections on estimation of random variables (6.5), maxi- 
mum likelihood estimation and Cramer-Rao lower bound (8.3) and Bayesian decision 
methods (8.6). An emphasis on queueing models is possible by including renewal 
processes (7.5) and Chapter 12. We note in particular that the last section in Chapter 
12 provides an introduction to simulation models and output data analysis not found 
in most textbooks. 


CHANGES IN THE THIRD EDITION 


This edition of the text has undergone several major changes: 


e The introduction to the notion of a random variable is now carried out in two 
phases: discrete random variables (Chapter 3) and continuous random variables 
(Chapter 4). 
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e Pairs of random variables and vector random variables are now covered in sepa- 
rate chapters (Chapters 5 and 6). More advanced topics have been placed in 
Chapter 6, e.g., general transformations, joint characteristic functions. 

e Chapter 8, a new chapter, provides an introduction to all of the standard topics on 
statistics. 

e Chapter 9 now provides separate and more detailed development of the random 
walk, Poisson, and Wiener processes. 


e Chapter 10 has expanded the coverage of discrete-time linear systems, and the 
link between discrete-time and continuous-time processing is bridged through 
the discussion of the sampling theorem. 

e Chapter 11 now provides a complete coverage of discrete-time Markov chains be- 
fore introducing continuous-time Markov chains. A new section shows how tran- 
sient behavior can be investigated through numerical and simulation techniques. 

e Chapter 12 now provides detailed discussions on the simulation of queueing sys- 
tems and the analysis of simulation data. 
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CHAPTER 


Probability Models 
in Electrical and 
Computer Engineering 


Electrical and computer engineers have played a central role in the design of modern 
information and communications systems. These highly successful systems work reli- 
ably and predictably in highly variable and chaotic environments: 


e Wireless communication networks provide voice and data communications to 
mobile users in severe interference environments. 

e The vast majority of media signals, voice, audio, images, and video are processed 
digitally. 

e Huge Web server farms deliver vast amounts of highly specific information to 
users. 


Because of these successes, designers today face even greater challenges. The sys- 
tems they build are unprecedented in scale and the chaotic environments in which they 
must operate are untrodden terrritory: 


e Web information is created and posted at an accelerating rate; future search ap- 
plications must become more discerning to extract the required response from a 
vast ocean of information. 

e Information-age scoundrels hijack computers and exploit these for illicit purpos- 
es, so methods are needed to identify and contain these threats. 

e Machine learning systems must move beyond browsing and purchasing applica- 
tions to real-time monitoring of health and the environment. 

e Massively distributed systems in the form of peer-to-peer and grid computing 
communities have emerged and changed the nature of media delivery, gaming, 
and social interaction; yet we do not understand or know how to control and 
manage such systems. 


Probability models are one of the tools that enable the designer to make sense 
out of the chaos and to successfully build systems that are efficient, reliable, and cost 
effective. This book is an introduction to the theory underlying probability models as 
well as to the basic techniques used in the development of such models. 
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1.1 


Chapter 1 Probability Models in Electrical and Computer Engineering 


This chapter introduces probability models and shows how they differ from the 
deterministic models that are pervasive in engineering. The key properties of the no- 
tion of probability are developed, and various examples from electrical and computer 
engineering, where probability models play a key role, are presented. Section 1.6 gives 
an overview of the book. 


MATHEMATICAL MODELS AS TOOLS IN ANALYSIS AND DESIGN 


The design or modification of any complex system involves the making of choices from 
various feasible alternatives. Choices are made on the basis of criteria such as cost, re- 
liability, and performance. The quantitative evaluation of these criteria is seldom made 
through the actual implementation and experimental evaluation of the alternative con- 
figurations. Instead, decisions are made based on estimates that are obtained using 
models of the alternatives. 

A model is an approximate representation of a physical situation. A model at- 
tempts to explain observed behavior using a set of simple and understandable rules. 
These rules can be used to predict the outcome of experiments involving the given 
physical situation. A useful model explains all relevant aspects of a given situation. 
Such models can be used instead of experiments to answer questions regarding the 
given situation. Models therefore allow the engineer to avoid the costs of experimenta- 
tion, namely, labor, equipment, and time. 

Mathematical models are used when the observational phenomenon has measur- 
able properties. A mathematical model consists of a set of assumptions about how a 
system or physical process works. These assumptions are stated in the form of mathe- 
matical relations involving the important parameters and variables of the system. The 
conditions under which an experiment involving the system is carried out determine the 
“givens” in the mathematical relations, and the solution of these relations allows us to 
predict the measurements that would be obtained if the experiment were performed. 

Mathematical models are used extensively by engineers in guiding system design 
and modification decisions. Intuition and rules of thumb are not always reliable in pre- 
dicting the performance of complex and novel systems, and experimentation is not pos- 
sible during the initial phases of a system design. Furthermore, the cost of extensive 
experimentation in existing systems frequently proves to be prohibitive. The availabil- 
ity of adequate models for the components of a complex system combined with a 
knowledge of their interactions allows the scientist and engineer to develop an overall 
mathematical model for the system. It is then possible to quickly and inexpensively an- 
swer questions about the performance of complex systems. Indeed, computer pro- 
grams for obtaining the solution of mathematical models form the basis of many 
computer-aided analysis and design systems. 

In order to be useful, a model must fit the facts of a given situation. Therefore the 
process of developing and validating a model necessarily consists of a series of experi- 
ments and model modifications as shown in Fig. 1.1. Each experiment investigates a 
certain aspect of the phenomenon under investigation and involves the taking of ob- 
servations and measurements under a specified set of conditions. The model is used 
to predict the outcome of the experiment, and these predictions are compared with 
the actual observations that result when the experiment is carried out. If there is a 
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FIGURE 1.1 
The modeling process. 


significant discrepancy, the model is then modified to account for it. The modeling 
process continues until the investigator is satisfied that the behavior of all relevant as- 
pects of the phenomenon can be predicted to within a desired accuracy. It should be 
emphasized that the decision of when to stop the modeling process depends on the im- 
mediate objectives of the investigator. Thus a model that is adequate for one applica- 
tion may prove to be completely inadequate in another setting. 

The predictions of a mathematical model should be treated as hypothetical until 
the model has been validated through a comparison with experimental measure- 
ments. A dilemma arises in a system design situation: The model cannot be validated 
experimentally because the real system does not exist. Computer simulation models 
play a useful role in this situation by presenting an alternative means of predicting sys- 
tem behavior, and thus a means of checking the predictions made by a mathematical 
model. A computer simulation model consists of a computer program that simulates or 
mimics the dynamics of a system. Incorporated into the program are instructions that 
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1.2 


1.3 


Chapter 1 Probability Models in Electrical and Computer Engineering 


“measure” the relevant performance parameters. In general, simulation models are 
capable of representing systems in greater detail than mathematical models. Howev- 
er, they tend to be less flexible and usually require more computation time than math- 
ematical models. 

In the following two sections we discuss the two basic types of mathematical 
models, deterministic models and probability models. 


DETERMINISTIC MODELS 


In deterministic models the conditions under which an experiment is carried out deter- 
mine the exact outcome of the experiment. In deterministic mathematical models, the 
solution of a set of mathematical equations specifies the exact outcome of the experi- 
ment. Circuit theory is an example of a deterministic mathematical model. 

Circuit theory models the interconnection of electronic devices by ideal circuits 
that consist of discrete components with idealized voltage-current characteristics. The 
theory assumes that the interaction between these idealized components is completely 
described by Kirchhoff’s voltage and current laws. For example, Ohm’s law states that 
the voltage-current characteristic of a resistor is Z = V/R. The voltages and currents in 
any circuit consisting of an interconnection of batteries and resistors can be found by 
solving a system of simultaneous linear equations that is found by applying Kirchhoff’s 
laws and Ohm’s law. 

If an experiment involving the measurement of a set of voltages is repeated a 
number of times under the same conditions, circuit theory predicts that the observa- 
tions will always be exactly the same. In practice there will be some variation in the ob- 
servations due to measurement errors and uncontrolled factors. Nevertheless, this 
deterministic model will be adequate as long as the deviation about the predicted val- 
ues remains small. 


PROBABILITY MODELS 


Many systems of interest involve phenomena that exhibit unpredictable variation and 


randomness. We define a random experiment to be an experiment in which the out- 


come varies in an unpredictable fashion when the experiment is repeated under the 
same conditions. Deterministic models are not appropriate for random experiments 


since they predict the same outcome for each repetition of an experiment. In this sec- 
tion we introduce probability models that are intended for random experiments. 

As an example of a random experiment, suppose a ball is selected from an urn 
containing three identical balls, labeled 0, 1, and 2. The urn is first shaken to random- 
ize the position of the balls, and a ball is then selected. The number of the ball is noted, 
and the ball is then returned to the urn. The outcome of this experiment is a number 
from the set S = {0, 1,2}. We call the set S of all possible outcomes the sample space. 
Figure 1.2 shows the outcomes in 100 repetitions (trials) of a computer simulation of 
this urn experiment. It is clear that the outcome of this experiment cannot consistent- 
ly be predicted correctly. 


1.3.1 
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FIGURE 1.2 
Outcomes of urn experiment. 


Statistical Regularity 


In order to be useful, a model must enable us to make predictions about the future be- 
havior of a system, and in order to be predictable, a phenomenon must exhibit regu- 
larity in its behavior. Many probability models in engineering are based on the fact 
that averages obtained in long sequences of repetitions (trials) of random experi- 
ments consistently yield approximately the same value. This property is called 
statistical regularity. 

Suppose that the above urn experiment is repeated n times under identical condi- 
tions. Let No(m), Ni(n), and Nj(n) be the number of times in which the outcomes are 
balls 0, 1, and 2, respectively, and let the relative frequency of outcome k be defined by 


= N,(n) 


(1.1) 


By statistical regularity we mean that f(n) varies less and less about a constant value 
as n is made large, that is, 


f(n 


dim f(n) = Pr- (1.2) 
The constant p, is called the probability of the outcome k. Equation (1.2) states that 
the probability of an outcome is the long-term proportion of times it arises in a long se- 
quence of trials. We will see throughout the book that Eq. (1.2) provides the key con- 
nection in going from the measurement of physical quantities to the probability 
models discussed in this book. 
Figures 1.3 and 1.4 show the relative frequencies for the three outcomes in the 
above urn experiment as the number of trials n is increased. It is clear that all the relative 
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FIGURE 1.3 
Relative frequencies in urn experiment. 
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FIGURE 1.4 
Relative frequencies in urn experiment. 
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frequencies are converging to the value 1/3. This is in agreement with our intuition that 
the three outcomes are equiprobable. 

Suppose we alter the above urn experiment by placing in the urn a fourth identi- 
cal ball with the number 0. The probability of the outcome 0 is now 2/4 since two of the 
four balls in the urn have the number 0. The probabilities of the outcomes 1 and 2 
would be reduced to 1/4 each. This demonstrates a key property of probability models, 
namely, the conditions under which a random experiment is performed determine the 
probabilities of the outcomes of an experiment. 


Properties of Relative Frequency 


We now present several properties of relative frequency. Suppose that a random exper- 
iment has K possible outcomes, that is, § = {1,2,..., K}. Since the number of occur- 
rences of any outcome in n trials is a number between zero and n, we have that 


O<N,(n)<n fork =1,2,...,K, 


and thus dividing the above equation by n, we find that the relative frequencies are a 
number between zero and one: 
0O=<f,(n) =1 fork = 1,2,...,K. (1.3) 


The sum of the number of occurrences of all possible outcomes must be n: 


K 
> Nn) =n 
k=1 


If we divide both sides of the above equation by n, we find that the sum of all the rela- 
tive frequencies equals one: 


K 
fen) =1. (1.4) 


Sometimes we are interested in the occurrence of events associated with the out- 
comes of an experiment. For example, consider the event “an even-numbered ball is se- 
lected” in the above urn experiment. What is the relative frequency of this event? The 
event will occur whenever the number of the ball is 0 or 2. The number of experiments 
in which the outcome is an even-numbered ball is therefore Ng(n) = Mln) + No(n). 
The relative frequency of the event is thus 


_Ne(n) _ No() + No(n) 


n n 


feln) = fo(n) + f(n). 


This example shows that the relative frequency of an event is the sum of the relative 
frequencies of the associated outcomes. More generally, let C be the event “A or B oc- 
curs,” where A and B are two events that cannot occur simultaneously, then the num- 
ber of times when C occurs is Nc(n) = Na(n) + Ng(n), so 


feln) = faln) + fa(n). (1.5) 


Equations (1.3), (1.4), and (1.5) are the three basic properties of relative frequency 
from which we can derive many other useful results. 
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1.3.4 


The Axiomatic Approach to a Theory of Probability 


Equation (1.2) suggests that we define the probability of an event by its long-term rel- 
ative frequency. There are problems with using this definition of probability to develop 
a mathematical theory of probability. First of all, it is not clear when and in what math- 
ematical sense the limit in Eq. (1.2) exists. Second, we can never perform an experi- 
ment an infinite number of times, so we can never know the probabilities p exactly. 
Finally, the use of relative frequency to define probability would rule out the applica- 
bility of probability theory to situations in which an experiment cannot be repeated. 
Thus it makes practical sense to develop a mathematical theory of probability that is 
not tied to any particular application or to any particular notion of what probability 
means. On the other hand, we must insist that, when appropriate, the theory should 
allow us to use our intuition and interpret probability as relative frequency. 

In order to be consistent with the relative frequency interpretation, any definition 
of “probability of an event” must satisfy the properties in Eqs. (1.3) through (1.5). The 
modern theory of probability begins with a construction of a set of axioms that specify 
that probability assignments must satisfy these properties. It supposes that: (1) a ran- 
dom experiment has been defined, and a set S of all possible outcomes has been identi- 
fied; (2) a class of subsets of S called events has been specified; and (3) each event A has 
been assigned a number, P[A], in such a way that the following axioms are satisfied: 


1. 0 <s PIA] = 1. 

2. P[S] = 1. 

3. If A and B are events that cannot occur simultaneously, 
then P[A or B] =P[A] + P[B]. 


The correspondence between the three axioms and the properties of relative frequen- 
cy stated in Eqs. (1.3) through (1.5) is apparent. These three axioms lead to many use- 
ful and powerful results. Indeed, we will spend the remainder of this book developing 
many of these results. 

Note that the theory of probability does not concern itself with how the proba- 
bilities are obtained or with what they mean. Any assignment of probabilities to events 
that satisfies the above axioms is legitimate. It is up to the user of the theory, the model 
builder, to determine what the probability assignment should be and what interpreta- 
tion of probability makes sense in any given application. 


Building a Probability Model 


Let us consider how we proceed from a real-world problem that involves randomness 
to a probability model for the problem. The theory requires that we identify the ele- 
ments in the above axioms. This involves (1) defining the random experiment inherent 
in the application, (2) specifying the set S of all possible outcomes and the events of in- 
terest, and (3) specifying a probability assignment from which the probabilities of all 
events of interest can be computed. The challenge is to develop the simplest model that 
explains all the relevant aspects of the real-world problem. 

As an example, suppose that we test a telephone conversation to determine 
whether a speaker is currently speaking or silent. We know that on the average the 
typical speaker is active only 1/3 of the time; the rest of the time he is listening to the 
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other party or pausing between words and phrases. We can model this physical situa- 
tion as an urn experiment in which we select a ball from an urn containing two white 
balls (silence) and one black ball (active speech). We are making a great simplification 
here; not all speakers are the same, not all languages have the same silence-activity 
behavior, and so forth. The usefulness and power of this simplification becomes ap- 
parent when we begin asking questions that arise in system design, such as: What is 
the probability that more than 24 speakers out of 48 independent speakers are active 
at the same time? This question is equivalent to: What is the probability that more 
than 24 black balls are selected in 48 independent repetitions of the above urn exper- 
iment? By the end of Chapter 2 you will be able to answer the latter question and all 
the real-world problems that can be reduced to it! 


A DETAILED EXAMPLE: A PACKET VOICE TRANSMISSION SYSTEM 


In the beginning of this chapter we claimed that probability models provide a tool that 
enables the designer to successfully design systems that must operate in a random en- 
vironment, but that nevertheless are efficient, reliable, and cost effective. In this sec- 
tion, we present a detailed example of such a system. Our objective here is to convince 
you of the power and usefulness of probability theory. The presentation intentionally 
draws upon your intuition. Many of the derivation steps that may appear nonrigorous 
now will be made precise later in the book. 

Suppose that a communication system is required to transmit 48 simultaneous 
conversations from site A to site B using “packets” of voice information. The speech of 
each speaker is converted into voltage waveforms that are first digitized (i.e., convert- 
ed into a sequence of binary numbers) and then bundled into packets of information 
that correspond to 10-millisecond (ms) segments of speech. A source and destination 
address is appended to each voice packet before it is transmitted (see Fig. 1.5). 

The simplest design for the communication system would transmit 48 packets 
every 10 ms in each direction. This is an inefficient design, however, since it is known 
that on the average about 2/3 of all packets contain silence and hence no speech infor- 
mation. In other words, on the average the 48 speakers only produce about 48/3 = 16 
active (nonsilence) packets per 10-ms period. We therefore consider another system 
that transmits only M < 48 packets every 10 ms. 

Every 10 ms, the new system determines which speakers have produced packets 
with active speech. Let the outcome of this random experiment be A, the number of ac- 
tive packets produced in a given 10-ms segment. The quantity A takes on values in the 
range from 0 (all speakers silent) to 48 (all speakers active). If A =< M, then all the active 
packets are transmitted. However, if A > M, then the system is unable to transmit all 
the active packets,so A — M of the active packets are selected at random and discarded. 
The discarding of active packets results in the loss of speech, so we would like to keep the 
fraction of discarded active packets at a level that the speakers do not find objectionable. 

First consider the relative frequencies of A. Suppose the above experiment is re- 
peated n times. Let A( j) be the outcome in the jth trial. Let N,(”) be the number of trials 
in which the number of active packets is k. The relative frequency of the outcome k in the 
first n trials is then f(n) = N,(n)/n, which we suppose converges to a probability px: 

lim f,(n) = px 0<k s 48. (1.6) 


no 
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FIGURE 1.5 
A packet voice transmission system. 


In Chapter 2 we will derive the probability p, that k speakers are active. Figure 1.6 
shows px versus k. It can be seen that the most frequent number of active speakers is 16 
and that the number of active speakers is seldom above 24 or so. 

Next consider the rate at which active packets are produced. The average number 
of active packets produced per 10-ms interval is given by the sample mean of the num- 
ber of active packets: 


(Ayn = 2A) (1.7) 
1 48 

= —SKN,(n). (1.8) 
NK=0 


The first expression adds the number of active packets produced in the first n trials in the 
order in which the observations were recorded. The second expression counts how many 
of these observations had k active packets for each possible value of k, and then com- 
putes the total.! As n gets large, the ratio N,()/n in the second expression approaches 
Px. Thus the average number of active packets produced per 10-ms segment approaches 


48 
(An > Dkp = EJA]. (1.9) 


‘Suppose you pull out the following change from your pocket: 1 quarter, 1 dime, 1 quarter, 1 nickel. Equa- 
tion (1.7) says your total is 25 + 10 + 25 + 5 = 65 cents. Equation (1.8) says your total is (1)5 + (1)10 + 
(2)(25) = 65 cents. 
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FIGURE 1.6 
Probabilities for number of active speakers in a group of 48. 


The expression on the right-hand side will be defined as the expected value of A in 
Section 3.3. E[A] is completely determined by the probabilities p, and in Chapter 3 we 
will show that E[A] = 48 x 1/3 = 16. Equation (1.9) states that the long-term average 
number of active packets produced per 10-ms period is E[A] = 16 speakers per 10 ms. 

The information provided by the probabilities p, allows us to design systems that 
are efficient and that provide good voice quality. For example, we can reduce the trans- 
mission capacity in half to 24 packets per 10-ms period, while discarding an impercep- 
tible number of active packets. 

Let us summarize what we have done in this section. We have presented an ex- 
ample in which the system behavior is intrinsically random, and in which the system 
performance measures are stated in terms of long-term averages. We have shown how 
these long-term measures lead to expressions involving the probabilities of the various 
outcomes. Finally we have indicated that, in some cases, probability theory allows us to 
derive these probabilities. We are then able to predict the long-term averages of vari- 
ous quantities of interest and proceed with the system design. 


OTHER EXAMPLES 


In this section we present further examples from electrical and computer engineering, 
where probability models are used to design systems that work in a random environ- 
ment. Our intention here is to show how probabilities and long-term averages arise 
naturally as performance measures in many systems. We hasten to add, however, that 
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this book is intended to present the basic concepts of probability theory and not de- 
tailed applications. For the interested reader, references for further reading are provid- 
ed at the end of this and other chapters. 


Communication over Unreliable Channels 


Many communication systems operate in the following way. Every T seconds, the 
transmitter accepts a binary input, namely, a 0 or a 1, and transmits a corresponding sig- 
nal. At the end of the T seconds, the receiver makes a decision as to what the input was, 
based on the signal it has received. Most communications systems are unreliable in the 
sense that the decision of the receiver is not always the same as the transmitter input. 
Figure 1.7(a) models systems in which transmission errors occur at random with prob- 
ability e. As indicated in the figure, the output is not equal to the input with probabili- 
ty e. Thus e is the long-term proportion of bits delivered in error by the receiver. In 
situations where this error rate is not acceptable, error-control techniques are intro- 
duced to reduce the error rate in the delivered information. 

One method of reducing the error rate in the delivered information is to use 
error-correcting codes as shown in Fig. 1.7(b). As a simple example, consider a repeti- 
tion code where each information bit is transmitted three times: 


0 — 000 
1— 111. 


If we suppose that the decoder makes a decision on the information bit by taking a ma- 
jority vote of the three bits output by the receiver, then the decoder will make the 
wrong decision only if two or three of the bits are in error. In Example 2.37, we show 
that this occurs with probability 3? — 2e°. Thus if the bit error rate of the channel 
without coding is 107°, then the delivered bit error with the above simple code will be 
3 x 10°, a reduction of three orders of magnitude! This improvement is obtained at a 
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(a) A model for a binary communication channel. (b) Error control system. 
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cost, however: The rate of transmission of information has been slowed down to 1 bit 
every 3T seconds. By going to longer, more complicated codes, it is possible to obtain 
reductions in error rate without the drastic reduction in transmission rate of this simple 
example. 

Error detection and correction methods play a key role in making reliable 
communications possible over radio and other noisy channels. Probability plays a 
role in determining the error patterns that are likely to occur and that hence must 
be corrected. 


Compression of Signals 


The outcome of a random experiment need not be a single number, but can also be an 
entire function of time. For example, the outcome of an experiment could be a voltage 
waveform corresponding to speech or music. In these situations we are interested in 
the properties of a signal and of processed versions of the signal. 

For example, suppose we are interested in compressing a music signal S(t). This 
involves representing the signal by a sequence of bits. Compression techniques provide 
efficient representations by using prediction, where the next value of the signal is pre- 
dicted using past encoded values. Only the error in the prediction needs to be encoded 
so the number of bits can be reduced. 

In order to work, prediction systems require that we know how the signal values 
are correlated with each other. Given this correlation structure we can then design op- 
timum prediction systems. Probability plays a key role in solving these problems. Com- 
pression systems have been highly successful and are found in cell phones, digital 
cameras, and camcorders. 


Reliability of Systems 


Reliability is a major concern in the design of modern systems. A prime example is the 
system of computers and communication networks that support the electronic transfer 
of funds between banks. It is of critical importance that this system continues operating 
even in the face of subsystem failures. The key question is, How does one build reliable 
systems from unreliable components? Probability models provide us with the tools to 
address this question in a quantitative way. 

The operation of a system requires the operation of some or all of its compo- 
nents. For example, Fig. 1.8(a) shows a system that functions only when all of its com- 
ponents are functioning, and Fig. 1.8(b) shows a system that functions as long as at least 
one of its components is functioning. More complex systems can be obtained as combi- 
nations of these two basic configurations. 

We all know from experience that it is not possible to predict exactly when a 
component will fail. Probability theory allows us to evaluate measures of reliability 
such as the average time to failure and the probability that a component is still func- 
tioning after a certain time has elapsed. Furthermore, we will see in Chapters 2 and 4 
that probability theory enables us to determine these averages and probabilities for an 
entire system in terms of the probabilities and averages of its components. This allows 
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(a) Series configuration of components. (b) Parallel configuration of components. 


FIGURE 1.8 
Systems with n components. 


us to evaluate system configurations in terms of their reliability, and thus to select sys- 
tem designs that are reliable. 


Resource-Sharing Systems 


Many applications involve sharing resources that are subject to unsteady and random 
demand. Clients intersperse demands for short periods of service between relatively 
long idle periods. The demands of the clients can be met by dedicating sufficient re- 
sources to each individual client, but this approach can be wasteful because the re- 
sources go unused when a client is idle. A better approach is to configure systems 
where client demands are met through dynamic sharing of resources. 

For example, many Web server systems operate as shown in Fig. 1.9. These sys- 
tems allow up to c clients to be connected to a server at any given time. Clients submit 
queries to the server. The query is placed in a waiting line and then processed by the 
server. After receiving the response from the server, each client spends some time 
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J 
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FIGURE 1.9 
Simple model for Web server system. 
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FIGURE 1.10 
A large community of users interacting across the Internet. 
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thinking before placing the next query. The system closes an existing client’s connec- 
tion after a timeout period, and replaces it with a new client. 

The system needs to be configured to provide rapid responses to clients, to avoid 
premature closing of connections, and to utilize the computing resources effectively. 
This requires the probabilistic characterization of the query processing time, the num- 
ber of clicks per connection, and the time between clicks (think time). These parame- 
ters are then used to determine the optimum value of c as well as the timeout value. 


Internet Scale Systems 


One of the major current challenges today is the design of Internet-scale systems as the 
client-server systems of Fig. 1.9 evolve into massively distributed systems, as in Fig. 1.10. 
In these new systems the number of users who are online at the same time can be in the 
tens of thousands and in the case of peer-to-peer systems in the millions. 

The interactions among users of the Internet are much more complex than those 
of clients accessing a server. For example, the links in Web pages that point to other 
Web pages create a vast web of interconnected documents. The development of 
graphing and mapping techniques to represent these logical relationships is key to un- 
derstanding user behavior. A variety of Web crawling techniques have been devel- 
oped to produce such graphs [Broder]. Probabilistic techniques can assess the relative 
importance of nodes in these graphs and, indeed, play a central role in the operation 
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of search engines. New applications, such as peer-to-peer file sharing and content dis- 
tribution, create new communities with their own interconnectivity patterns and 
graphs. The behavior of users in these communities can have dramatic impact on the 
volume, patterns, and dynamics of traffic flows in the Internet. Probabilistic methods 
are playing an important role in understanding these systems and in developing meth- 
ods to manage and control resources so that they operate in reliable and predictable 
fashion [15]. 


OVERVIEW OF BOOK 


In this chapter we have discussed the important role that probability models play in the 
design of systems that involve randomness. The principal objective of this book is to in- 
troduce the student to the basic concepts of probability theory that are required to under- 
stand probability models used in electrical and computer engineering. The book is not 
intended to cover applications per se; there are far too many applications, with each one 
requiring its own detailed discussion. On the other hand, we do attempt to keep the ex- 
amples relevant to the intended audience by drawing from relevant application areas. 
Another objective of the book is to present some of the basic techniques required to 
develop probability models. The discussion in this chapter has made it clear that the 
probabilities used in a model must be determined experimentally. Statistical techniques 
are required to do this, so we have included an introduction to the basic but essential 
statistical techniques. We have also alluded to the usefulness of computer simulation 
models in validating probability models. Most chapters include a section that presents 
some useful computer method. These sections are optional and can be skipped without 
loss of continuity. However, the student is encouraged to explore these techniques. 
They are fun to play with, and they will provide insight into the nature of randomness. 
The remainder of the book is organized as follows: 


e Chapter 2 presents the basic concepts of probability theory. We begin with the ax- 
ioms of probability that were stated in Section 1.3 and discuss their implications. 
Several basic probability models are introduced in Chapter 2. 


e In general, probability theory does not require that the outcomes of random ex- 
periments be numbers. Thus the outcomes can be objects (e.g., black or white 
balls) or conditions (e.g., computer system up or down). However, we are usually 
interested in experiments where the outcomes are numbers. The notion of a ran- 
dom variable addresses this situation. Chapters 3 and 4 discuss experiments 
where the outcome is a single number from a discrete set or a continuous set, re- 
spectively. In these two chapters we develop several extremely useful problem- 
solving techniques. 

e Chapter 5 discusses pairs of random variables and introduces methods for de- 
scribing the correlation of interdependence between random variables. Chapter 6 
extends these methods to vector random variables. 

e Chapter 7 presents mathematical results (limit theorems) that answer the ques- 
tion of what happens in a very long sequence of independent repetitions of an 
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experiment. The results presented will justify our extensive use of relative fre- 
quency to motivate the notion of probability. 


Chapter 8 provides an introduction to basic statistical methods. 


Chapter 9 introduces the notion of a random or stochastic process, which is sim- 
ply an experiment in which the outcome is a function of time. 


Chapter 10 introduces the notion of the power spectral density and its use in the 
analysis and processing of random signals. 

Chapter 11 discusses Markov chains, which are random processes that allow us to 
model sequences of nonindependent experiments. 


Chapter 12 presents an introduction to queueing theory and various applications. 


Mathematical models relate important system parameters and variables using 
mathematical relations. They allow system designers to predict system perfor- 
mance by using equations when experimentation is not feasible or too costly. 
Computer simulation models are an alternative means of predicting system per- 
formance. They can be used to validate mathematical models. 

In deterministic models the conditions under which an experiment is performed 
determine the exact outcome. The equations in deterministic models predict an 
exact outcome. 

In probability models the conditions under which a random experiment is per- 
formed determine the probabilities of the possible outcomes. The solution of the 
equations in probability models yields the probabilities of outcomes and events 
as well as various types of averages. 


The probabilities and averages for a random experiment can be found experi- 
mentally by computing relative frequencies and sample averages in a large num- 
ber of repetitions of a random experiment. 

The performance measures in many systems of practical interest involve relative 
frequencies and long-term averages. Probability models are used in the design of 
these systems. 


CHECKLIST OF IMPORTANT TERMS 


Deterministic model Random experiment 
Event Relative frequency 
Expected value Sample mean 
Probability Sample space 


Probability model Statistical regularity 


ANNOTATED REFERENCES 


References [1] through [5] discuss probability models in an engineering context. 
References [6] and [7] are classic works, and they contain excellent discussions on 
the foundations of probability models. Reference [8] is an introduction to error 
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control. Reference [9] discusses random signal analysis in the context of communi- 
cation systems, and references [10] and [11] discuss various aspects of random signal 
analysis. References [12] and [13] are introductions to performance aspects of com- 
puter communications. 


15. 


PROBLEMS 
1.1. 


A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic 
Processes, 4th ed., McGraw-Hill, New York, 2002. 


. D.P. Bertsekas and J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 


Belmont, MA, 2002. 


. T. L. Fine, Probability and Probabilistic Reasoning for Electrical Engineering, 


Prentice Hall, Upper Saddle River, N.J., 2006. 


. H. Stark and J. W. Woods, Probability and Random Processes with Applications to 


Signal Processing, 3d ed., Prentice Hall, Upper Saddle River, N.J., 2002. 


. R. D. Yates and D. J. Goodman, Probability and Stochastic Processes, Wiley, New 


York, 2005. 


. H. Cramer, Mathematical Models of Statistics, Princeton University Press, Prince- 


ton, N.J., 1946. 


. W.Feller, An Introduction to Probability Theory and Its Applications, Wiley, New 


York, 1968. 


. S. Lin and R. Costello, Error Control Coding: Fundamentals and Applications, 


Prentice Hall, Upper Saddle River, N.J., 2005. 


. S. Haykin, Communications Systems, 4th ed., Wiley, New York, 2000. 
. A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 


2d ed., Prentice Hall, Upper Saddle River, N.J., 1999. 


. J. Gibson, T. Berger, and T. Lookabough, Digital Compression and Multimedia, 


Morgan Kaufmann Publishers, San Francisco, 1998. 


. L. Kleinrock, Queueing Theory, Volume 1: Theory, Wiley, New York, 1975. 
. D. Bertsekas and R. G. Gallager, Data Networks, Prentice Hall, Upper Saddle 


River, N.J., 1987. 


. Broder et al., “Graph Structure in the Web,” Proceedings of the 9th internation- 


al World Wide Web conference on Computer networks: the international journal 
of computer and telecommunications networking, North-Holland, The Nether- 
lands, 2000. 

P. Baldi et al., Modeling the Internet and the Web, Wiley, Hoboken, N.J., 2003. 


Consider the following three random experiments: 

Experiment 1: Toss a coin. 

Experiment 2: Toss a die. 

Experiment 3: Select a ball at random from an urn containing balls numbered 0 to 9. 
(a) Specify the sample space of each experiment. 


(b) Find the relative frequency of each outcome in each of the above experiments in a 
large number of repetitions of the experiment. Explain your answer. 


1.2. 
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Explain how the following experiments are equivalent to random urn experiments: 
(a) Flip a fair coin twice. 
(b) Toss a pair of fair dice. 


(c) Draw two cards from a deck of 52 distinct cards, with replacement after the first 
draw; without replacement after the first draw. 


Explain under what conditions the following experiments are equivalent to a random 
coin toss. What is the probability of heads in the experiment? 


(a) Observe a pixel (dot) in a scanned black-and-white document. 
(b) Receive a binary signal in a communication system. 

(c) Test whether a device is working. 

(d) Determine whether your friend Joe is online. 


(e) Determine whether a bit error has occurred in a transmission over a noisy communi- 
cation channel. 


An urn contains three electronically labeled balls with labels 00, 01, 10. Lisa, Homer, and 
Bart are asked to characterize the random experiment that involves selecting a ball at ran- 
dom and reading the label. Lisa’s label reader works fine; Homer’s label reader has the 
most significant digit stuck at 1; Bart’s label reader’s least significant digit is stuck at 0. 


(a) What is the sample space determined by Lisa, Homer, and Bart? 

(b) What are the relative frequencies observed by Lisa, Homer, and Bart in a large num- 
ber of repetitions of the experiment? 

A random experiment has sample space S = {1,2,3,4} with probabilities p,; = 1/2, 

P2 = 1/4, p3 = 1/8, p4 = 1/8. 

(a) Describe how this random experiment can be simulated using tosses of a fair coin. 

(b) Describe how this random experiment can be simulated using an urn experiment. 

(c) Describe how this experiment can be simulated using a deck of 52 distinct cards. 

A random experiment consists of selecting two balls in succession from an urn containing 

two black balls and and one white ball. 

(a) Specify the sample space for this experiment. 

(b) Suppose that the experiment is modified so that the ball is immediately put back into 
the urn after the first selection. What is the sample space now? 

(c) What is the relative frequency of the outcome (white, white) in a large number of 
repetitions of the experiment in part a? In part b? 

(d) Does the outcome of the second draw from the urn depend in any way on the out- 
come of the first draw in either of these experiments? 

Let A be an event associated with outcomes of a random experiment, and let the event B 

be defined as “event A does not occur.” Show that fg(n) = 1 — fa(n). 

Let A, B, and C be events that cannot occur simultaneously as pairs or triplets, and let D 

be the event “A or B or C occurs.” Show that 


foln) = faln) + feln) + feln). 


The sample mean for a series of numerical outcomes X (1), X(2),..., X(n) of a se- 
quence of random experiments is defined by 


(X)n = XO). 
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Probability Models in Electrical and Computer Engineering 


Show that the sample mean satisfies the recursion formula: 


Suppose that the signal 2 cos 27t is sampled at random instants of time. 
(a) Find the long-term sample mean. 


(b) Find the long-term relative frequency of the events “voltage is positive”; “voltage is 
less than —2.” 

(c) Do the answers to parts a and b change if the sampling times are periodic and taken 
every 7 seconds? 

In order to generate a random sequence of random numbers you take a column of tele- 

phone numbers and output a “0” if the last digit in the telephone number is even and a 

“1” if the digit is odd. Discuss how one could determine if the resulting sequence is “ran- 

dom.” What test would you apply to the relative frequencies of single outcomes? Of pairs 

of outcomes? 


Basic Concepts 
of Probability Theory 


2.1 


CHAPTER 


This chapter presents the basic concepts of probability theory. In the remainder of the 
book, we will usually be further developing or elaborating the basic concepts present- 
ed here. You will be well prepared to deal with the rest of the book if you have a good 
understanding of these basic concepts when you complete the chapter. 

The following basic concepts will be presented. First, set theory is used to specify 
the sample space and the events of a random experiment. Second, the axioms of prob- 
ability specify rules for computing the probabilities of events. Third, the notion of con- 
ditional probability allows us to determine how partial information about the outcome 
of an experiment affects the probabilities of events. Conditional probability also allows 
us to formulate the notion of “independence” of events and of experiments. Finally, we 
consider “sequential” random experiments that consist of performing a sequence of 
simple random subexperiments. We show how the probabilities of events in these exper- 
iments can be derived from the probabilities of the simpler subexperiments. Throughout 
the book it is shown that complex random experiments can be analyzed by decompos- 
ing them into simple subexperiments. 


SPECIFYING RANDOM EXPERIMENTS 


A random experiment is an experiment in which the outcome varies in an unpre- 
dictable fashion when the experiment is repeated under the same conditions. A ran- 
dom experiment is specified by stating an experimental procedure and a set of one or 
more measurements or observations. 


Example 2.1 


Experiment E;: Select a ball from an urn containing balls numbered 1 to 50. Note the number of 
the ball. 

Experiment E,: Select a ball from an urn containing balls numbered 1 to 4. Suppose that balls 1 
and 2 are black and that balls 3 and 4 are white. Note the number and color of the ball you select. 
Experiment E3: Toss a coin three times and note the sequence of heads and tails. 

Experiment E4: Toss a coin three times and note the number of heads. 

Experiment Es: Count the number of voice packets containing only silence produced from a 
group of N speakers in a 10-ms period. 
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Experiment Es: A block of information is transmitted repeatedly over a noisy channel until an 
error-free block arrives at the receiver. Count the number of transmissions required. 
Experiment E7: Pick a number at random between zero and one. 

Experiment Eg: Measure the time between page requests in a Web server. 

Experiment Ey: Measure the lifetime of a given computer memory chip in a specified environment. 
Experiment E19: Determine the value of an audio signal at time ¢,. 

Experiment E\,;: Determine the values of an audio signal at times t; and fy. 

Experiment E,,: Pick two numbers at random between zero and one. 

Experiment E,3: Pick a number X at random between zero and one, then pick a number Y at 
random between zero and X. 

Experiment E14: A system component is installed at time t = 0. For t = 0 let X(t) = 1 as long 
as the component is functioning, and let X(t) = 0 after the component fails. 


The specification of a random experiment must include an unambiguous statement 
of exactly what is measured or observed. For example, random experiments may consist 
of the same procedure but differ in the observations made, as illustrated by £} and E4. 

A random experiment may involve more than one measurement or observation, 
as illustrated by Fy, E3, E11, E12, and £,3. A random experiment may even involve a 
continuum of measurements, as shown by E44. 

Experiments £3, Fy, Es, Es, E12, and F£,3 are examples of sequential experi- 
ments that can be viewed as consisting of a sequence of simple subexperiments. Can 
you identify the subexperiments in each of these? Note that in E,; the second subex- 
periment depends on the outcome of the first subexperiment. 


The Sample Space 


Since random experiments do not consistently yield the same result, it is necessary to 
determine the set of possible results. We define an outcome or sample point of a ran- 
dom experiment as a result that cannot be decomposed into other results. When we 
perform a random experiment, one and only one outcome occurs. Thus outcomes are 
mutually exclusive in the sense that they cannot occur simultaneously. The sample 
space S of a random experiment is defined as the set of all possible outcomes. 

We will denote an outcome of an experiment by ¢, where ¢ is an element or point 
in S. Each performance of a random experiment can then be viewed as the selection at 
random of a single point (outcome) from S. 

The sample space S can be specified compactly by using set notation. It can be visu- 
alized by drawing tables, diagrams, intervals of the real line, or regions of the plane. There 
are two basic ways to specify a set: 


1. List all the elements, separated by commas, inside a pair of braces: 
A = {0,1,2,3}, 

2. Give a property that specifies the elements of the set: 
A = {x: xis an integer such that 0 < x = 3}. 


Note that the order in which items are listed does not change the set, e.g., {0, 1, 2, 3} 
and {1, 2, 3, 0} are the same set. 
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Example 2.2 


The sample spaces corresponding to the experiments in Example 2.1 are given below using set 
notation: 


Sı = {1,2,...,50} 

S» = {(1, b), (2, b), (3, w), (4, w)} 

S; = {HHH, HHT, HTH, THA, TTH, THT, HTT, TTT} 

S4 = {0,1,2,3} 

S; = {0,1,2,..., N} 

Se = {1,2,3,... } 

S7= {x:0 =x < 1} = [0,1] See Fig. 2.1(a). 

Sg = {t:t = 0} = [0, œ) 

Sy = {t:t = 0} = [0, co) See Fig. 2.1(b). 

Sio = {v: =% < v < Co} = (—, 00) 

Su = {(%1, %2): =% < v < œ% and -œ < v < co} 

Sp = {(x,y):0 =x =1and0<s y< 1} See Fig. 2.1 (c). 

S3 = {(x, y): 0S ysx<1} See Fig. 2.1(d). 

Si4 = set of functions X(t) for which X(t) = 1 for 0 = t < tọ and X(t) = 0 for t = tọ, 
where tọ > 0 is the time when the component fails. 

Random experiments involving the same experimental procedure may have dif- 


ferent sample spaces as shown by Experiments £E; and E4. Thus the purpose of an ex- 
periment affects the choice of sample space. 


S7 So 


0 1 0 
(a) Sample space for Experiment £E}. (b) Sample space for Experiment Eo. 


y 4y 


0 1 


(c) Sample space for Experiment £5. (d) Sample space for Experiment E73. 


FIGURE 2.1 
Sample spaces for Experiments £7, Fy, E12, and £43. 
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There are three possibilities for the number of outcomes in a sample space. A 
sample space can be finite, countably infinite, or uncountably infinite. We call S a 
discrete sample space if S is countable; that is, its outcomes can be put into one-to-one 
correspondence with the positive integers. We call S a continuous sample space if S is 
not countable. Experiments E4, E2, E3, E4, and Es have finite discrete sample spaces. 
Experiment Eş has a countably infinite discrete sample space. Experiments E7 through 
E3 have continuous sample spaces. 

Since an outcome of an experiment can consist of one or more observations or 
measurements, the sample space S can be multi-dimensional. For example, the out- 
comes in Experiments F,, E11, E12, and E}; are two-dimensional, and those in Experi- 
ment £3 are three-dimensional. In some instances, the sample space can be written as 
the Cartesian product of other sets.! For example, S4} = R X R, where R is the set of 
real numbers, and S$; = S X S X S, where S = {H, T}. 

It is sometimes convenient to let the sample space include outcomes that are 
impossible. For example, in Experiment Ey it is convenient to define the sample 
space as the positive real line, even though a device cannot have an infinite life- 
time. 


Events 


We are usually not interested in the occurrence of specific outcomes, but rather in 
the occurrence of some event (i.e., whether the outcome satisfies certain condi- 
tions). This requires that we consider subsets of S. We say that A is a subset of B if 
every element of A also belongs to B. For example, in Experiment Ejo, which in- 
volves the measurement of a voltage, we might be interested in the event “signal 
voltage is negative.” The conditions of interest define a subset of the sample space, 
namely, the set of points ¢ from S that satisfy the given conditions. For example, 
“voltage is negative” corresponds to the set {£:—co < ¢ < 0}. The event occurs if 
and only if the outcome of the experiment ¢ is in this subset. For this reason events 
correspond to subsets of S. 

Two events of special interest are the certain event, S, which consists of all out- 
comes and hence always occurs, and the impossible or null event, @, which contains no 
outcomes and hence never occurs. 


Example 2.3 
In the following examples, A, refers to an event corresponding to Experiment E; in Example 2.1. 


E,: “An even-numbered ball is selected,” A; = {2,4,..., 48, 50}. 
E: “The ball is white and even-numbered,” A, = {(4, w)}. 

E3: “The three tosses give the same outcome,” A; = {HHH, TTT}. 
E,: “The number of heads equals the number of tails,” Ay = ©. 

Es: “No active packets are produced,” A; = {0}. 


'The Cartesian product of the sets A and B consists of the set of all ordered pairs (a, b), where the first ele- 
ment is taken from A and the second from B. 
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Es: “Fewer than 10 transmissions are required,” Ag = {1,..., 9}. 

Ez: “The number selected is nonnegative,” A7 = Sy. 

Ex: “Less than ty seconds elapse between page requests,” Ag = {t:0 = t < to} = [0, to). 

Ey: “The chip lasts more than 1000 hours but fewer than 1500 hours,” Ay = {t: 1000 < t < 1500} 
= (1000, 1500). 

Eio: “The absolute value of the voltage is less than 1 volt,” Ajg = {v: -1 < v < 1} = (-1,1). 

Ey: “The two voltages have opposite polarities,” A4} = {(%1, %2): (v; < 0 and v, > 0) or (vy, > 0 
and v, < 0)}. 

E>: “The two numbers differ by less than 1/10,” Ay. = {(x, y):(x, y) in Sy. and |x — y| < 1/10}. 

E3: “The two numbers differ by less than 1/10,” A43 = {(x, y):(x, y) in S;3 and |x — y| < 1/10}. 

E,4: “The system is functioning at time t1,” A44 = subset of S44 for which X(t,) = 1. 


An event may consist of a single outcome, as in A, and A5. An event from a 
discrete sample space that consists of a single outcome is called an elementary event. 
Events A, and As are elementary events. An event may also consist of the entire sam- 
ple space, as in A7. The null event, ©, arises when none of the outcomes satisfy the con- 
ditions that specify a given event, as in 44. 


Review of Set Theory 


In random experiments we are interested in the occurrence of events that are repre- 
sented by sets. We can combine events using set operations to obtain other events. We 
can also express complicated events as combinations of simple events. Before proceed- 
ing with further discussion of events and random experiments, we present some essen- 
tial concepts from set theory. 

A set is a collection of objects and will be denoted by capital letters S, A, B,.... 
We define U as the universal set that consists of all possible objects of interest in a 
given setting or application. In the context of random experiments we refer to the uni- 
versal set as the sample space. For example, the universal set in Experiment £¢ is 
U = {1,2,...}. Aset A is a collection of objects from U, and these objects are called 
the elements or points of the set A and will be denoted by lowercase letters, 
£, a, b, x, y,.... We use the notation: 


xeA and xgA 


to indicate that “x is an element of A” or “x is not an element of A,” respectively. 

We use Venn diagrams when discussing sets. A Venn diagram is an illustration of 
sets and their interrelationships. The universal set U is usually represented as the set of 
all points within a rectangle as shown in Fig. 2.2(a). The set A is then the set of points 
within an enclosed region inside the rectangle. 

We say A is a subset of B if every element of A also belongs to B, that is, if xe A 
implies x e B. We say that “A is contained in B” and we write: 


ACB. 


If A is a subset of B, then the Venn diagram shows the region for A to be inside the 
region for B as shown in Fig. 2.2(e). 
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GS 


A B 


(b) ANB 


(JANB=O 


(A-B 


(g) (A U Bye (h) Ac Be 


FIGURE 2.2 
Set operations and set relations. 


Example 2.4 


In Experiment Eş three sets of interest might be A = {x: x = 10} = {10,11,... }, that is, 10 or 
more transmissions are required; B = {2, 4, 6,... }, the number of transmissions is an even num- 
ber; and C = {x: x = 20} = {20, 21,...}. Which of these sets are subsets of the others? 

Clearly, C is a subset of A(C C A). However, C is not a subset of B, and B is not a subset 
of C, because both sets contain elements the other set does not contain. Similarly, B is not a sub- 
set of A, and A is not a subset of B. 


The empty set Ø is defined as the set with no elements. The empty set Ø is a sub- 
set of every set, that is, for any set A,@ C A. 

We say sets A and B are equal if they contain the same elements. Since every ele- 
ment in A is also in B, then x €e A implies x e B, so A C B. Similarly every element in B 
is also in A, so x e B implies x e A and so B C A. Therefore: 


A=B ifandonlyif ACB and BCA. 


The standard method to show that two sets, A and B, are equal is to show that 
AC Band BC A.A second method is to list all the items in A and all the items in B, 
and to show that the items are the same. A variation of this second method is to use a 
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Venn diagram to identify the region that corresponds to A and to then show that the 
Venn diagram for B occupies the same region. We provide examples of both methods 
shortly. 

We will use three basic operations on sets. The union and the intersection opera- 
tions are applied to two sets and produce a third set. The complement operation is ap- 
plied to a single set to produce another set. 

The union of two sets A and B is denoted by A U B and is defined as the set of 
outcomes that are either in A or in B, or both: 


AUB={x:xeA or xeB}. 


The operation A U B corresponds to the logical “or” of the properties that define set A 
and set B, that is, x isin A U B if x satisfies the property that defines A, or x satisfies the 
property that defines B, or both. The Venn diagram for A U B consists of the shaded 
region in Fig. 2.2(a). 

The intersection of two sets A and B is denoted by A N B and is defined as the set 
of outcomes that are in both A and B: 


ANB={x:xeA and xeB}. 


The operation AM B corresponds to the logical “and” of the properties that define 
set A and set B. The Venn diagram for AM B consists of the double shaded region 
in Fig. 2.2(b). Two sets are said to be disjoint or mutually exclusive if their intersec- 
tion is the null set, AM B = ©. Figure 2.2(d) shows two mutually exclusive sets A 
and B. 

The complement of a set A is denoted by A‘ and is defined as the set of all ele- 
ments not in A: 

A = {x:x¢ A}. 


The operation A corresponds to the logical “not” of the property that defines set A. 
Figure 2.2(c) shows A‘. Note that S€ = Ø and @ = S. 

The relative complement or difference of sets A and B is the set of elements in A 
that are not in B: 


A-B={x:xeAandx¢ B}. 


A — Bis obtained by removing from A all the elements that are also in B, as illustrat- 
ed in Fig. 2.2(f). Note that A — B = AN B°. Note also that BS = S — B. 


Example 2.5 


Let A, B, and C be the events from Experiment Es in Example 2.4. Find the following events: 
AUB, ANB, A, BY, A — B,and B — A. 


AUB = {2,4, 6, 8, 10, 11, 12,...}; 
ANB = {10, 12, 14,...}; 

Æ = {x:x < 10} = {1,2,..., 9}; 
Bo = {1,3,5,...}; 
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A — B = {11, 13, 15,...}; 
and B — A = {2,4,6,8}. 


The three basic set operations can be combined to form other sets. The following 
properties of set operations are useful in deriving new expressions for combinations 
of sets: 


Commutative properties: 

AUB=BUA and ANB = BNA. (2.1) 
Associative properties: 

AU(BUC)=(AUB)UC and AN(BAC)=(ANB)NAC. (2.2) 
Distributive properties: 

AU(BAC)=(AUB)A(AUC) and 

AN(BUC) = (ANB)U(ANC). (2.3) 


By applying the above properties we can derive new identities. DeMorgan’s rules pro- 
vide an important such example: 


DeMorgan’s rules: 


(AUB) = ANB and (ANB) = AUB (2.4) 


Example 2.6 


Prove DeMorgan’s rules by using Venn diagrams and by demonstrating set equality. 

First we will use a Venn diagram to show the first equality. The shaded region in Fig. 2.2(g) 
shows the complement of A U B, the left-hand side of the equation. The cross-hatched region in 
Fig. 2.2(h) shows the intersection of A° and B®. The two regions are the same and so the sets are 
equal. Try sketching the Venn diagrams for the second equality in Eq. (2.4). 

Next we prove DeMorgan’s rules by proving set equality. The proof has two parts: First we 
show that (A U B)° C ÆA N B5; then we show that A N B° C (AU B)°. Together these results 
imply (AU B): = ÆN BS. 

First, suppose that xe (AU B)*, then x¢ AUB. In particular, we have x ¢ A, which im- 
plies x e A’. Similarly, we have x ¢ B, which implies x e B®. Hence x is in both A‘ and B*, that is, 
xe A N B°. We have shown that (A U B) C ÆN B®. 

To prove inclusion in the other direction, suppose that xe A N B°. This implies that 
xe A’, so x¢ A. Similarly, x e B° and so x ¢ B. Therefore, x ¢ (AU B) and so x e (A U B)°. We 
have shown that A°M B° C (A U B)°. This proves that (A U B): = ÆA N B°. 

To prove the second DeMorgan rule, apply the first DeMorgan rule to A° and B° to 
obtain: 


(ÆU B°) = (Æ) (BY) = ANB, 


where we used the identity A = (A‘)°. Now take complements of both sides of the above 
equation: 


ÆU Bo = (ANB). 
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Example 2.7 

For Experiment Fj, let the sets A, B, and C be defined by 
A = {v:|v| > 10}, “magnitude of v is greater than 10 volts,” 
B= {v:v < —5}, “v is less than —5 volts,” 
C = {u:v > 0}, “v is positive.” 


You should then verify that 
AUB = {v:v < —Sorv > 10}, 
ANB = {v:v < —10}, 
Co = {v:v = 0}, 
(AU B)MC = {v: v > 10}, 
ANBNC = Ø, and 
(AU B) = {v: -5 = v =< 10}. 


The union and intersection operations can be repeated for an arbitrary number 
of sets. Thus the union of n sets 


k=1 


is the set that consists of all elements that are in A, for at least one value of k. The same 
definition applies to the union of a countably infinite sequence of sets: 


JA ke (2.6) 
k=l 
The intersection of n sets 
n 
(Ag = ANANA NA, (2.7) 
k=1 
is the set that consists of elements that are in all of the sets A,,..., An. The same defi- 
nition applies to the intersection of a countably infinite sequence of sets: 
NQ Ax. (2.8) 
k=1 


We will see that countable unions and intersections of sets are essential in dealing with 
sample spaces that are not finite. 


Event Classes 


We have introduced the sample space S as the set of all possible outcomes of the ran- 
dom experiment. We have also introduced events as subsets of S. Probability theory 
also requires that we state the class F of events of interest. Only events in this class 
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are assigned probabilities. We expect that any set operation on events in F will pro- 
duce a set that is also an event in F. In particular, we insist that complements, as well 
as countable unions and intersections of events in £F, i.e., Eqs. (2.1) and (2.5) through 
(2.8), result in events in F. When the sample space S is finite or countable, we simply 
let F consist of all subsets of S and we can proceed without further concerns about F. 
However, when S is the real line R (or an interval of the real line), we cannot let F be 
all possible subsets of R and still satisfy the axioms of probability. Fortunately, we can 
obtain all the events of practical interest by letting F be of the class of events ob- 
tained as complements and countable unions and intersections of intervals of the real 
line, e.g., (a, b] or (— 0%, b]. We will refer to this class of events as the Borel field. In the 
remainder of the book, we will refer to the event class F from time to time. For the in- 
troductory-level course in probability you will not need to know more than what is 
stated in this paragraph. 

When we speak of a class of events we are referring to a collection (set) of events 
(sets), that is, we are speaking of a “set of sets.” We refer to the collection of sets as a 
class to remind us that the elements of the class are sets. We use script capital letters to 
refer to a class, e.g., C, F, G. If the class C consists of the collection of sets A;,..., Ax, 
then we write C = {Aj,,..., Ax}. 


Example 2.8 


Let S = {T, H} be the outcome of a coin toss. Let every subset of S be an event. Find all possi- 
ble events of S. 
An event is a subset of S, so we need to find all possible subsets of S. These are: 


S = {O, {H}, {T}, {H, T}}. 


Note that S includes both the empty set and S. Let ir and iy be binary numbers where i = 1 in- 
dicates that the corresponding element of S is in a given subset. We generate all possible subsets 
by taking all possible values of the pair ip and iy. Thus i; = 0, iy = 1 corresponds to the set 
{H}. Clearly there are 2” possible subsets as listed above. 


For a finite sample space,S = {1,2,..., k}? we usually allow all subsets of S to be 
events. This class of events is called the power set of S and we will denote it by S. We can 
index all possible subsets of S with binary numbers i, i2,..., ig, and we find that the 
power set of S has 2* members. Because of this, the power set is also denoted by S = 2°. 

Section 2.8 discusses some of the fine points on event classes. 


THE AXIOMS OF PROBABILITY 


Probabilities are numbers assigned to events that indicate how “likely” it is that the 
events will occur when an experiment is performed. A probability law for a random ex- 
periment is a rule that assigns probabilities to the events of the experiment that belong 
to the event class F. Thus a probability law is a function that assigns a number to sets 
(events). In Section 1.3 we found a number of properties of relative frequency that any 
definition of probability should satisfy. The axioms of probability formally state that a 


The discussion applies to any finite sample space with arbitrary objects S = {x,,..., xg}, but we consider 
{1,2,..., k} for notational simplicity. 
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probability law must satisfy these properties. In this section, we develop a number of 
results that follow from this set of axioms. 

Let E be a random experiment with sample space S and event class F. A 
probability law for the experiment E is a rule that assigns to each event Ae F a 
number P[A], called the probability of A, that satisfies the following axioms: 


Axiom I 0s P[A] 

Axiom II P[S] =1 

Axiom IIT If AM B= ©, then P| AU B] = P[A] + P[B]. 
Axiom ITT’ If A,, Az,... is a sequence of events such that 


A, A; = © for alli + j, then 


k=1 k=1 


Axioms I, II, and HI are enough to deal with experiments with finite sample 
spaces. In order to handle experiments with infinite sample spaces, Axiom III needs to 
be replaced by Axiom III’. Note that Axiom III’ includes Axiom III as a special case, 
by letting A, = © for k = 3. Thus we really only need Axioms I, II, and III’. Never- 
theless we will gain greater insight by starting with Axioms I, II, and III. 

The axioms allow us to view events as objects possessing a property (i.e., their 
probability) that has attributes similar to physical mass. Axiom I states that the proba- 
bility (mass) is nonnegative, and Axiom II states that there is a fixed total amount of 
probability (mass), namely 1 unit. Axiom III states that the total probability (mass) in 
two disjoint objects is the sum of the individual probabilities (masses). 

The axioms provide us with a set of consistency rules that any valid probability 
assignment must satisfy. We now develop several properties stemming from the axioms 
that are useful in the computation of probabilities. 

The first result states that if we partition the sample space into two mutually ex- 
clusive events, A and A‘, then the probabilities of these two events add up to one. 


Corollary 1 
P(A] = 1 — PLA] 


Proof: Since an event A and its complement A‘ are mutually exclusive, A N A° = Ø, we have 
from Axiom III that 


P[AU A} = P[A] + P[A‘]. 
Since § = AU Æ, by Axiom II, 
1 = P[S] = P AUA] = P[A] + P[A‘]. 
The corollary follows after solving for P[ A‘]. 


The next corollary states that the probability of an event is always less than or 
equal to one. Corollary 2 combined with Axiom I provide good checks in problem 
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solving: If your probabilities are negative or are greater than one, you have made a 
mistake somewhere! 


Corollary 2 
P{[A] <1 
Proof: From Corollary 1, 


since P| Af] = 0. 

Corollary 3 states that the impossible event has probability zero. 
Corollary 3 
P[@] = 0 


Proof: Let A = S and Æ = Ø in Corollary 1: 
P[@] = 1 — P[S] = 0. 


Corollary 4 provides us with the standard method for computing the probability 
of a complicated event A. The method involves decomposing the event A into the 
union of disjoint events A1, A2,..., An. The probability of A is the sum of the proba- 
bilities of the A,’s. 


Corollary 4 


If A,, Az,..., A, are pairwise mutually exclusive, then 


Af as | = X P[A;]  forn = 2. 
k=1 k=1 


Proof: We use mathematical induction. Axiom III implies that the result is true for n = 2. Next 
we need to show that if the result is true for some n, then it is also true for n + 1. This, combined 
with the fact that the result is true for n = 2, implies that the result is true for n = 2. 

Suppose that the result is true for some n > 2; that is, 


rf ia] = X P(A); (2.9) 
k=1 k=1 
and consider the n + 1 case 
n+1 n n 
P| Üa = {asl U Aus = P| as | + P[An+4]; (2.10) 
k=1 k=1 k=1 


where we have applied Axiom III to the second expression after noting that the union of events 
A, to A,, is mutually exclusive with A„+1. The distributive property then implies 


{Wad N An = U{AgN Ana} = U Ø = Ø. 
k=1 k=1 k=1 
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Substitution of Eq. (2.9) into Eq. (2.10) gives the n + 1 case 
nt+1 n+1 

r| a = X P[Ak]. 
k=1 k=1 


Corollary 5 gives an expression for the union of two events that are not necessar- 
ily mutually exclusive. 


Corollary 5 
P[AU B] = P[A] + P[B] — P[AN B] 


Proof: First we decompose A U B, A, and B as unions of disjoint events. From the Venn diagram 
in Fig. 2.3, 


P[AUB] = P[ANB‘] + P[BN A] + PAN B] 
P[A] = P[AN B°] + P[AN B] 
P[B] = P[BN Æ] + P[ANB] 


By substituting P[ A N B°] and P[B N A‘] from the two lower equations into the top equation, 
we obtain the corollary. 


By looking at the Venn diagram in Fig. 2.3, you will see that the sum P[A] + P[B] 
counts the probability (mass) of the set AN B twice. The expression in Corollary 5 
makes the appropriate correction. 

Corollary 5 is easily generalized to three events, 


P[AUBUC] = P[A] + P[B] + P[C] - P[ANB] 
- P[ANC] - P[BNC] + P[ANBNC], (211) 


and in general to n events, as shown in Corollary 6. 


FIGURE 2.3 
Decomposition of A U B into three disjoint sets. 
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Corollary 6 
ae e E > PLA — BPIAN Ag] + 
= = 


+ (-1)""!P[A,N ++) O A]. 


Proof is by induction (see Problems 2.26 and 2.27). 


Since probabilities are nonnegative, Corollary 5 implies that the probability 
of the union of two events is no greater than the sum of the individual event prob- 
abilities 

P| AUB] S P[A] + P[B]}. (2.12) 

The above inequality is a special case of the fact that a subset of another set must 
have smaller probability. This result is frequently used to obtain upper bounds for 
probabilities of interest. In the typical situation, we are interested in an event A whose 


probability is difficult to find; so we find an event B for which the probability can be 
found and that includes A as a subset. 


Corollary 7 
If AC B, then P[A] = P[B]. 
Proof: In Fig. 2.4, B is the union of A and Æ N B, thus 
P[B] = P[A] + P| AN B] = PIA], 


since P[ A N B] = 0. 


The axioms together with the corollaries provide us with a set of rules for comput- 
ing the probability of certain events in terms of other events. However, we still need an 
initial probability assignment for some basic set of events from which the probability of 
all other events can be computed. This problem is dealt with in the next two subsections. 


(4 ene 


B 


FIGURE 2.4 
fA C B, then P(A) = P(B). 
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Discrete Sample Spaces 


In this section we show that the probability law for an experiment with a countable sam- 
ple space can be specified by giving the probabilities of the elementary events. First, sup- 
pose that the sample space is finite, S = {a,, a2,...,a,} and let F consist of all subsets 
of S. All distinct elementary events are mutually exclusive, so by Corollary 4 the prob- 
ability of any event B = {a}, a},..., am} is given by 


P(B] = Pl{ai,aj,...,a,}] 
= P[{a1}] + P[{a2}] +: Pia hs (2.13) 


that is, the probability of an event is equal to the sum of the probabilities of the outcomes 
in the event. Thus we conclude that the probability law for a random experiment with a fi- 
nite sample space is specified by giving the probabilities of the elementary events. 

If the sample space has n elements, S = {a,,..., a,}, a probability assignment of 
particular interest is the case of equally likely outcomes. The probability of the ele- 
mentary events is 


1 
Pliai}] = Pla] == PHa] =~. (2.14) 
The probability of any event that consists of k outcomes, say B = {a},..., ak}, is 
k 
P[B] = Pliait] +--+ PHa] =~. (2.15) 


Thus if outcomes are equally likely, then the probability of an event is equal to the num- 
ber of outcomes in the event divided by the total number of outcomes in the sample 
space. Section 2.3 discusses counting methods that are useful in finding probabilities in 
experiments that have equally likely outcomes. 

Consider the case where the sample space is countably infinite, S = {a,,a),...}. 
Let the event class F be the class of all subsets of S. Note that F must now satisfy Eq. (2.8) 
because events can consist of countable unions of sets. Axiom II’ implies that the 
probability of an event such as D = {b,, b2, b3,... } is given by 


P[D] = P[{bi, b3, b3,...}] = P[{b1}] + P[{b2}] + Pl{b3}] + --. 
The probability of an event with a countably infinite sample space is determined from 
the probabilities of the elementary events. 


Example 2.9 


An urn contains 10 identical balls numbered 0, 1,..., 9. A random experiment involves selecting a 
ball from the urn and noting the number of the ball. Find the probability of the following events: 


A = “number of ball selected is odd,” 
B = “number of ball selected is a multiple of 3,” 


C = “number of ball selected is less than 5,” 


andof AUBand AUBUC. 
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The sample space is S = {0,1,..., 9}, so the sets of outcomes corresponding to the above 
events are 


A= {1,3,5,7,9}, B= {3,6,9}, and C= {0,1,2,3,4}. 


If we assume that the outcomes are equally likely, then 
P[A] = P[{1}] + P[{3}] + P[{5}] + P[{7}] + P[{9}] = =. 


PLB] = PH3}] + PL(6}] + PL(9}] = = 


PIC] = PHO} + PII} + PH2} + PLB} + PHH = = 
From Corollary 5, 


5 3 2 6 
BAUR HPA] PPB FANS =e aa y 


where we have used the fact that AM B = {3,9},so P[ AN B] = 2/10. From Corollary 6, 


P[AUBUC] = P[A] + P[B] + P[C] — P[ANB] 
— P[ANC] — P[BNC] + P[ANBNC] 
Ba ube, NO Or a. ad 


10 10°10 10 10 10 10 


S 
You should verify the answers for P[ A U B] and P| AU B U C] by enumerating the outcomes in 
the events. 


Many probability models can be devised for the same sample space and events by 
varying the probability assignment; in the case of finite sample spaces all we need to do 
is come up with n nonnegative numbers that add up to one for the probabilities of the 
elementary events. Of course, in any particular situation, the probability assignment 
should be selected to reflect experimental observations to the extent possible. The fol- 
lowing example shows that situations can arise where there is more than one “reason- 
able” probability assignment and where experimental evidence is required to decide 
on the appropriate assignment. 


Example 2.10 


Suppose that a coin is tossed three times. If we observe the sequence of heads and tails, then 
there are eight possible outcomes S; = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}. If 
we assume that the outcomes of S} are equiprobable, then the probability of each of the eight el- 
ementary events is 1/8. This probability assignment implies that the probability of obtaining two 
heads in three tosses is, by Corollary 3, 


P(“2 heads in 3 tosses”] = P[ {HHT, HTH, THH} |] 


= P[{HHT}] + P[{HTH}] + P[{THH}] = 5, 


2.2.2 


Section 2.2 The Axioms of Probability 37 


Now suppose that we toss a coin three times but we count the number of heads in three 
tosses instead of observing the sequence of heads and tails. The sample space is now 
S4 = {0,1, 2,3}. If we assume the outcomes of S4 to be equiprobable, then each of the elemen- 
tary events of S, has probability 1/4. This second probability assignment predicts that the proba- 
bility of obtaining two heads in three tosses is 


1 
P[“2 heads in 3 tosses”] = P[{2}] = T 


The first probability assignment implies that the probability of two heads in three toss- 
es is 3/8, and the second probability assignment predicts that the probability is 1/4. Thus the 
two assignments are not consistent with each other. As far as the theory is concerned, either 
one of the assignments is acceptable. It is up to us to decide which assignment is more ap- 
propriate. Later in the chapter we will see that only the first assignment is consistent with 
the assumption that the coin is fair and that the tosses are “independent.” This assignment 
correctly predicts the relative frequencies that would be observed in an actual coin tossing 
experiment. 


Finally we consider an example with a countably infinite sample space. 


Example 2.11 


A fair coin is tossed repeatedly until the first heads shows up; the outcome of the experiment is 
the number of tosses required until the first heads occurs. Find a probability law for this experi- 
ment. 

It is conceivable that an arbitrarily large number of tosses will be required until heads 
occurs, so the sample space is S = {1,2,3,...}. Suppose the experiment is repeated n times. 
Let N; be the number of trials in which the jth toss results in the first heads. If n is very large, 
we expect N, to be approximately n/2 since the coin is fair. This implies that a second toss is 
necessary about n — N; ~ n/2 times, and again we expect that about half of these—that is, 
n/4—will result in heads, and so on, as shown in Fig. 2.5. Thus for large n, the relative fre- 


quencies are 
N; j 
j 1 ; 
j= —=(3) j=1,2,.... 


1V 
P[ j tosses till first heads] = (5) j=1,2,.... (2.16) 
We can verify that these probabilities add up to one by using the geometric series with a = 1/2: 
o a 
al = =1. 
> T= algi 


Continuous Sample Spaces 


Continuous sample spaces arise in experiments in which the outcomes are numbers 
that can assume a continuum of values, so we let the sample space S be the entire real 
line R (or some interval of the real line). We could consider letting the event class con- 
sist of all subsets of R. But it turns out that this class is “too large” and it is impossible 
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n trials 


Heads 


Heads Tails 


~ " trials 
8 


FIGURE 2.5 
In n trials heads comes up in the first toss approximately n/2 times, in 
the second toss approximately n/4 times, and so on. 


to assign probabilities to all the subsets of R. Fortunately, it is possible to assign proba- 
bilities to all events in a smaller class that includes all events of practical interest. This 
class denoted by B, is called the Borel field and it contains all open and closed intervals 
of the real line as well as all events that can be obtained as countable unions, intersec- 
tions, and complements.’ Axiom III’ is once again the key to calculating probabilities of 
events. Let A1, Az,... be a sequence of mutually exclusive events that are represented 
by intervals of the real line, then 


P| Üa | = SPA 
Z1 A 


where each P[ Ax] is specified by the probability law. For this reason, probability laws 
in experiments with continuous sample spaces specify a rule for assigning numbers to in- 
tervals of the real line. 


Example 2.12 


Consider the random experiment “pick a number x at random between zero and one.” The sample 
space S for this experiment is the unit interval [0, 1], which is uncountably infinite. If we suppose that 
all the outcomes S are equally likely to be selected, then we would guess that the probability that the 
outcome is in the interval [0, 1/2] is the same as the probability that the outcome is in the interval 
[1/2, 1]. We would also guess that the probability of the outcome being exactly equal to 1/2 would be 
zero since there are an uncountably infinite number of equally likely outcomes. 


3Section 2.9 discusses B in more detail. 
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Consider the following probability law: “The probability that the outcome falls in a subin- 
terval of S is equal to the length of the subinterval,” that is, 


P[[a,b]] = (b-a) for0<a<b<1, (2.17) 


where by P[[a, b]] we mean the probability of the event corresponding to the interval [a, b]. 
Clearly, Axiom I is satisfied since b = a = 0. Axiom II follows from S = [a, b] with a = 0 and 
b=1. 

We now show that the probability law is consistent with the previous guesses about the 
probabilities of the events [0, 1/2], [1/2, 1], and {1/2}: 


P[[0,0.5]] = 0.5 - 0 = .5 
P[[0.5,1]] = 1 — 0.5 = .5 


In addition, if xọ is any point in S, then P[[xo, xọ]] = 0 since individual points have zero width. 

Now suppose that we are interested in an event that is the union of several intervals; for 
example, “the outcome is at least 0.3 away from the center of the unit interval,” that is, 
A = [0, 0.2] U [0.8, 1]. Since the two intervals are disjoint, we have by Axiom III 


P[A] = P[[0,0.2]] + P[[0.8,1]] = .4. 


The next example shows that an initial probability assignment that specifies the 
probability of semi-infinite intervals also suffices to specify the probabilities of all 
events of interest. 


Example 2.13 


Suppose that the lifetime of a computer memory chip is measured, and we find that “the propor- 
tion of chips whose lifetime exceeds t decreases exponentially at a rate a.” Find an appropriate 
probability law. 

Let the sample space in this experiment be S = (0, co). If we interpret the above finding 
as “the probability that a chip’s lifetime exceeds t decreases exponentially at a rate a,” we then 
obtain the following assignment of probabilities to events of the form (t, 00): 


P[(t,0c)] =e" fort >O, (2.18) 


where a > 0. Note that the exponential is a number between 0 and 1 for t > 0, so Axiom Lis sat- 
isfied. Axiom II is satisfied since 


P[S] = P[(0, œ)] = 1. 


The probability that the lifetime is in the interval (r, s] is found by noting in Fig. 2.6 that 
(r, s] U (s, ©) = (r, œ), so by Axiom II, 


P[(r, c0)] = P[(r, s]] + P[(s, 00)]. 


FIGURE 2.6 
(r, œ) = (r, s]U (s, œ). 
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By rearranging the above equation we obtain 
P[(r,s]] = PL(r, 00)] — P[(s,c0)] =e — e“. 


We thus obtain the probability of arbitrary intervals in S. 


In both Example 2.12 and Example 2.13, the probability that the outcome takes on 
a specific value is zero. You may ask: If an outcome (or event) has probability zero, doesn’t 
that mean it cannot occur? And you may then ask: How can all the outcomes in a sam- 
ple space have probability zero? We can explain this paradox by using the relative 
frequency interpretation of probability. An event that occurs only once in an infinite num- 
ber of trials will have relative frequency zero. Hence the fact that an event or outcome has 
relative frequency zero does not imply that it cannot occur, but rather that it occurs very 
infrequently. In the case of continuous sample spaces, the set of possible outcomes is so 
rich that all outcomes occur infrequently enough that their relative frequencies are zero. 

We end this section with an example where the events are regions in the plane. 


Example 2.14 


Consider Experiment E12, where we picked two numbers x and y at random between zero and 
one. The sample space is then the unit square shown in Fig. 2.7(a). If we suppose that all pairs of 
numbers in the unit square are equally likely to be selected, then it is reasonable to use a proba- 
bility assignment in which the probability of any region R inside the unit square is equal to the 
area of R. Find the probability of the following events: A = {x > 0.5}, B = {y > 0.5}, and 
C= {x> y} 


y y 


S x> > 
x x 
0 1 0 1; 1 
(a) Sample space (b) Event {x > 3 
y y 
1 1 
1 
1 ae 
‘ a 
x x 
0 1 0 1 
(c) Event {y > 4} (d) Event {x > y} 
FIGURE 2.7 


A two-dimensional sample space and three events. 


*2.3 
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Figures 2.7(b) through 2.7(d) show the regions corresponding to the events A, B, and C. 
Clearly each of these regions has area 1/2. Thus 


We reiterate how to proceed from a problem statement to its probability model. 
The problem statement implicitly or explicitly defines a random experiment, which 
specifies an experimental procedure and a set of measurements and observations. 
These measurements and observations determine the set of all possible outcomes and 
hence the sample space S. 

An initial probability assignment that specifies the probability of certain events 
must be determined next. This probability assignment must satisfy the axioms of prob- 
ability. If S is discrete, then it suffices to specify the probabilities of elementary events. 
If S is continuous, it suffices to specify the probabilities of intervals of the real line or 
regions of the plane. The probability of other events of interest can then be determined 
from the initial probability assignment and the axioms of probability and their corol- 
laries. Many probability assignments are possible, so the choice of probability assign- 
ment must reflect experimental observations and/or previous experience. 


COMPUTING PROBABILITIES USING COUNTING METHODS* 


In many experiments with finite sample spaces, the outcomes can be assumed to be 
equiprobable. The probability of an event is then the ratio of the number of outcomes in 
the event of interest to the total number of outcomes in the sample space (Eq. (2.15)). 
The calculation of probabilities reduces to counting the number of outcomes in an 
event. In this section, we develop several useful counting (combinatorial) formulas. 

Suppose that a multiple-choice test has k questions and that for question i the 
student must select one of n; possible answers. What is the total number of ways of an- 
swering the entire test? The answer to question i can be viewed as specifying the ith 
component of a k-tuple, so the above question is equivalent to: How many distinct or- 
dered k-tuples (x,,..., xg) are possible if x; is an element from a set with n; distinct el- 
ements? 

Consider the k = 2 case. If we arrange all possible choices for x, and for x, along 
the sides of a table as shown in Fig. 2.8, we see that there are nın, distinct ordered pairs. 
For triplets we could arrange the n,n, possible pairs (x1, x2) along the vertical side of 
the table and the n, choices for x, along the horizontal side. Clearly, the number of pos- 
sible triplets is n,n n3. 

In general, the number of distinct ordered k-tuples (x,,...,X,) with components 
x; from a set with n; distinct elements is 


number of distinct ordered k-tuples = nnz... ng. (2.19) 


Many counting problems can be posed as sampling problems where we select 
“balls” from “urns” or “objects” from “populations.” We will now use Eq. (2.19) to de- 
velop combinatorial formulas for various types of sampling. 


‘This section and all sections marked with an asterisk may be skipped without loss of continuity. 
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ži 


ay a TN: an, 
by} (abı) (a,b) vee (4,,,b1) 
by} (a;b)  (a,b2) vee (4,,,b2) 
X2 
Dn, (arban) (azb) ero (an pbn) 
FIGURE 2.8 


If there are n, distinct choices for x, and nz distinct choices 
for xz, then there are nn; distinct ordered pairs (x, , X2). 


Sampling with Replacement and with Ordering 


Suppose we choose k objects from a set A that has n distinct objects, with replace- 
ment-—that is, after selecting an object and noting its identity in an ordered list, the ob- 
ject is placed back in the set before the next choice is made. We will refer to the set A 
as the “population.” The experiment produces an ordered k-tuple 


(%1,---,Xx), 
where x;e Aandi = 1,..., k. Equation (2.19) with n, = m =--- = ną = nimplies that 
number of distinct ordered k-tuples = n4. (2.20) 


Example 2.15 


An urn contains five balls numbered 1 to 5. Suppose we select two balls from the urn with re- 
placement. How many distinct ordered pairs are possible? What is the probability that the two 
draws yield the same number? 

Equation (2.20) states that the number of ordered pairs is 5? = 25. Table 2.1 shows the 25 
possible pairs. Five of the 25 outcomes have the two draws yielding the same number; if we sup- 
pose that all pairs are equiprobable, then the probability that the two draws yield the same num- 
ber is 5/25 = .2. 


Sampling without Replacement and with Ordering 


Suppose we choose k objects in succession without replacement from a population A of 
n distinct objects. Clearly, k = n. The number of possible outcomes in the first draw is 
nı = n; the number of possible outcomes in the second draw is ny = n — 1, namely all 
n objects except the one selected in the first draw; and so on, up ton, = n — (k — 1)in 
the final draw. Equation (2.19) then gives 


number of distinct ordered k-tuples = n(n — 1)...(n — k + 1). (2.21) 


2.3.3 
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TABLE 2.1 Enumeration of possible outcomes in various types of 
sampling of two balls from an urn containing five distinct balls. 


(a) Ordered pairs for sampling with replacement. 


(1,1) (1,2) (1,3) (1,4) (1,5) 
(2,1) (2,2) (2,3) (2,4) (2,5) 
(3, 1) (3,2) (3,3) (3,4) (3, 5) 
(4,1) (4,2) (4,3) (4,4) (4,5) 
(5, 1) (5,2) (5,3) (5,4) (5,5) 
(b) Ordered pairs for sampling without replacement. 
(1,2) (1,3) (1,4) (1,5) 
(2,1) (2,3) (2,4) (2,5) 
(3, 1) (3,2) (3,4) (3, 5) 
(4,1) (4,2) (4,3) (4,5) 
(5, 1) (5,2) (5,3) (5,4) 
(c) Pairs for sampling without replacement or ordering. 
(1,2) (1,3) (1,4) (1,5) 
(2,3) (2,4) (2,5) 
(3,4) (3, 5) 
(4,5) 


Example 2.16 


An urn contains five balls numbered 1 to 5. Suppose we select two balls in succession without re- 
placement. How many distinct ordered pairs are possible? What is the probability that the first 
ball has a number larger than that of the second ball? 

Equation (2.21) states that the number of ordered pairs is 5(4) = 20. The 20 possible or- 
dered pairs are shown in Table 2.1(b). Ten ordered pairs in Tab. 2.1(b) have the first number larg- 
er than the second number; thus the probability of this event is 10/20 = 1/2. 


Example 2.17 


An urn contains five balls numbered 1, 2,...,5. Suppose we draw three balls with replacement. 
What is the probability that all three balls are different? 

From Eq. (2.20) there are 5° = 125 possible outcomes, which we will suppose are 
equiprobable. The number of these outcomes for which the three draws are different is given 
by Eq. (2.21): 5(4)(3) = 60. Thus the probability that all three balls are different is 
60/125 = .48. 


Permutations of n Distinct Objects 


Consider sampling without replacement with k = n. This is simply drawing objects 
from an urn containing n distinct objects until the urn is empty. Thus, the number of 
possible orderings (arrangements, permutations) of n distinct objects is equal to the 
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number of ordered n-tuples in sampling without replacement with k = n. From Eq. (2.21), 
we have 


number of permutations of n objects = n(n — 1)...(2)(1) = al. (2.22) 


We refer to n! as n factorial. 
We will see that n! appears in many of the combinatorial formulas. For large n, 
Stirling’s formula is very useful: 


nl ~ Vr n" e”, (2.23) 


where the sign ~ indicates that the ratio of the two sides tends to unity as n — oo 
[Feller, p. 52]. 


Example 2.18 


Find the number of permutations of three distinct objects {1,2,3}. Equation (2.22) gives 
3! = 3(2)(1) = 6. The six permutations are 


123 312 231 132 213 321. 


Example 2.19 


Suppose that 12 balls are placed at random into 12 cells, where more than 1 ball is allowed to oc- 
cupy a cell. What is the probability that all cells are occupied? 

The placement of each ball into a cell can be viewed as the selection of a cell number be- 
tween 1 and 12. Equation (2.20) implies that there are 12!” possible placements of the 12 balls in 
the 12 cells. In order for all cells to be occupied, the first ball selects from any of the 12 cells, the 
second ball from the remaining 11 cells, and so on. Thus the number of placements that occupy 
all cells is 12!. If we suppose that all 12!” possible placements are equiprobable, we find that the 
probability that all cells are occupied is 


! 
= EE- : ) = 5.37(10°). 
12 12/\12 12 


This answer is surprising if we reinterpret the question as follows. Given that 12 airplane 
crashes occur at random in a year, what is the probability that there is exactly 1 crash each 
month? The above result shows that this probability is very small. Thus a model that assumes 
that crashes occur randomly in time does not predict that they tend to occur uniformly over time 
[Feller, p. 32]. 


Sampling without Replacement and without Ordering 


Suppose we pick k objects from a set of n distinct objects without replacement and that 
we record the result without regard to order. (You can imagine putting each selected 
object into another jar, so that when the k selections are completed we have no record 
of the order in which the selection was done.) We call the resulting subset of k selected 
objects a “combination of size k.” 

From Eq. (2.22), there are k! possible orders in which the k objects in the second 
jar could have been selected. Thus if Cz denotes the number of combinations of size k 
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from a set of size n, then CZk! must be the total number of distinct ordered samples of 
k objects, which is given by Eq. (2.21). Thus 


Cyk! = n(n —1)...(n—k +1), (2.24) 
and the number of different combinations of size k from a set of size n, k 5 n, is 
n(n —1)...(n-—k +1) n! afn 
z= = = f 2.25 
Ck k! k! (n-k)! \k (2.25) 


The expression a) is called a binomial coefficient and is read “n choose k.” 
Note that choosing k objects out of a set of n is equivalent to choosing the n — k 
objects that are to be left out. It then follows that (also see Problem 2.60): 


Example 2.20 


Find the number of ways of selecting two objects from A = {1, 2, 3, 4,5} without regard to order. 
Equation (2.25) gives 
5 5! 
(3) “Psy 


Table 2.1(c) gives the 10 pairs. 


Example 2.21 


Find the number of distinct permutations of k white balls and n — k black balls. 

This problem is equivalent to the following sampling problem: Put n tokens numbered 1 to 
n in an urn, where each token represents a position in the arrangement of balls; pick a combina- 
tion of k tokens and put the k white balls in the corresponding positions. Each combination of 
size k leads to a distinct arrangement (permutation) of k white balls and n — k black balls. Thus 
the number of distinct permutations of k white balls and n — k black balls is C{. 

As a specific example let n = 4 and k = 2. The number of combinations of size 2 from a 


set of four distinct objects is 
4\ at AUS) 
2 2!2! 2(1) 


The 6 distinct permutations with 2 whites (zeros) and 2 blacks (ones) are 


1100 0110 0011 1001 1010 0101. 


Example 2.22 Quality Control 


A batch of 50 items contains 10 defective items. Suppose 10 items are selected at random and 
tested. What is the probability that exactly 5 of the items tested are defective? 
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The number of ways of selecting 10 items out of a batch of 50 is the number of combina- 
tions of size 10 from a set of 50 objects: 


50\ 50! 

10/ 10! 40!" 
The number of ways of selecting 5 defective and 5 nondefective items from the batch of 50 is the 
product N,N;, where N; is the number of ways of selecting the 5 items from the set of 10 defec- 


tive items, and N; is the number of ways of selecting 5 items from the 40 nondefective items. Thus 
the probability that exactly 5 tested items are defective is 


ee 
5/\5/ 10! 40! 10! 40! _ 


50 5! 5! 35! 5! 50! 
10 


016. 


Example 2.21 shows that sampling without replacement and without ordering is 
equivalent to partitioning the set of n distinct objects into two sets: B, containing the k 
items that are picked from the urn, and B°, containing the n — k left behind. Suppose 
we partition a set of n distinct objects into J subsets B,, B2,..., Bz, where Bz is as- 
signed kz elements andk, + kọ +-+- + kz =n. 

In Problem 2.61, it is shown that the number of distinct partitions is 


n! 6 
ky! kal... Kg! (2.26) 
Equation (2.26) is called the multinomial coefficient. The binomial coefficient is the 
J = 2 case of the multinomial coefficient. 


Example 2.23 


A six-sided die is tossed 12 times. How many distinct sequences of faces (numbers from the set 
{1, 2, 3, 4, 5, 6}) have each number appearing exactly twice? What is the probability of obtaining 
such a sequence? 

The number of distinct sequences in which each face of the die appears exactly twice is the 
same as the number of partitions of the set {1,2,..., 12} into 6 subsets of size 2, namely 


12 oo 7,484,400 
Q2212!2121 96 


From Eq. (2.20) we have that there are 6! possible outcomes in 12 tosses of a die. If we suppose 
that all of these have equal probabilities, then the probability of obtaining a sequence in which 
each face appears exactly twice is 


12!/2° 7,484,400 
62  2,176,782,336 


= 3.4(10°3). 


2.3.5 


2.4 
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Sampling with Replacement and without Ordering 


Suppose we pick k objects from a set of n distinct objects with replacement and we 
record the result without regard to order. This can be done by filling out a form which 
has n columns, one for each distinct object. Each time an object is selected, an “x” is 
placed in the corresponding column. For example, if we are picking 5 objects from 4 


distinct objects, one possible form would look like this: 
Object 1 Object 2 Object 3 Object 4 
XX / / x / XX 


where the slash symbol (“/”) is used to separate the entries for different columns. Note 
that this form can be summarized by the sequence 


Xx//x/xx 


where the n — 1/’s indicate the lines between columns, and where nothing appears be- 
tween consecutive /’s if the corresponding object was not selected. Each different 
arrangement of 5 x’s and 3 /’s leads to a distinct form. If we identify x’s with “white 
balls” and /’s with “black balls,” then this problem was considered in Example 2.21, and 
the number of different arrangements is given by (3). 


In the general case the form will involve k x’s and n — 1/’s. Thus the number of 
different ways of picking k objects from a set of n distinct objects with replacement and 
without ordering is given by 


erge) 


CONDITIONAL PROBABILITY 


Quite often we are interested in determining whether two events, A and B, are related in 
the sense that knowledge about the occurrence of one, say B, alters the likelihood of oc- 
currence of the other, A. This requires that we find the conditional probability, P[ A | B], 
of event A given that event B has occurred. The conditional probability is defined by 


P[ANB] 


P[A|B] = PIB] 


for P[B] > 0. (2.27) 

Knowledge that event B has occurred implies that the outcome of the experi- 
ment is in the set B. In computing P[A | B] we can therefore view the experiment as 
now having the reduced sample space B as shown in Fig. 2.9. The event A occurs in the 
reduced sample space if and only if the outcome ¢ is in A N B. Equation (2.27) simply 
renormalizes the probability of events that occur jointly with B. Thus if we let A = B, 
Eq. (2.27) gives P[B| B] = 1, as required. It is easy to show that PĮ A | B], for fixed B, 
satisfies the axioms of probability. (See Problem 2.74.) 

If we interpret probability as relative frequency, then P[A | B] should be the rel- 
ative frequency of the event A N B in experiments where B occurred. Suppose that the 
experiment is performed n times, and suppose that event B occurs ng times, and that 
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FIGURE 2.9 
f B is known to have occurred, then A can occur only 
if AM B occurs. 


event A N B occurs nang times. The relative frequency of interest is then 


Nang _ nang” ; P[AN B] 


ng ngin P[B| ° 
where we have implicitly assumed that P[B] > 0. This is in agreement with Eq. (2.27). 


Example 2.24 


A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls, 
numbered 3 and 4. The number and color of the ball is noted, so the sample space is 
{(1, b), (2, b), (3, w), (4, w)}. Assuming that the four outcomes are equally likely, find P[A | B] 
and P[A |C], where A, B, and C are the following events: 


A = {(1,b), (2, b)}, “black ball selected,” 
B = {(2,b), (4, w)}, “even-numbered ball selected,” and 
C = {(3, w), (4, w)}, “number of ball is greater than 2.” 


Since P[AM B] = P[(2,b)] and P[ AN C] = P[@] = 0, Eq. (2.24) gives 


P[ANB 
P[A|B] = a l = 2 = .5 = P[A] 
PANC] 9 
P[A|C] = PIC) 3 0 # P[A]. 


In the first case, knowledge of B did not alter the probability of A. In the second case, knowledge 
of C implied that A had not occurred. 


If we multiply both sides of the definition of P| A | B] by P[B] we obtain 
P[AM B] = P[A|B]P[B]. (2.28a) 
Similarly we also have that 


P[AN B] = P[B| A]P[A]. (2.28b) 
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In the next example we show how this equation is useful in finding probabilities 
in sequential experiments. The example also introduces a tree diagram that facilitates 
the calculation of probabilities. 


Example 2.25 


An urn contains two black balls and three white balls. Two balls are selected at random from the 
urn without replacement and the sequence of colors is noted. Find the probability that both balls 
are black. 

This experiment consists of a sequence of two subexperiments. We can imagine working 
our way down the tree shown in Fig. 2.10 from the topmost node to one of the bottom nodes: We 
reach node 1 in the tree if the outcome of the first draw is a black ball; then the next subexperi- 
ment consists of selecting a ball from an urn containing one black ball and three white balls. On 
the other hand, if the outcome of the first draw is white, then we reach node 2 in the tree and the 
second subexperiment consists of selecting a ball from an urn that contains two black balls and 
two white balls. Thus if we know which node is reached after the first draw, then we can state the 
probabilities of the outcome in the next subexperiment. 

Let B, and B, be the events that the outcome is a black ball in the first and second draw, 
respectively. From Eq. (2.28b) we have 


PB, N By] = P[B,| B,|P[B,]. 


In terms of the tree diagram in Fig. 2.10, P[ B4] is the probability of reaching node 1 and P[ B; | B,] is 
the probability of reaching the leftmost bottom node from node 1. Now P[ B;] = 2/5 since the first 
draw is from an urn containing two black balls and three white balls; P| B, | B,] = 1/4 since, given B4, 
the second draw is from an urn containing one black ball and three white balls. Thus 


In general, the probability of any sequence of colors is obtained by multiplying the probabilities 


corresponding to the node transitions in the tree in Fig. 2.10. 


Outcome of first draw 


Outcome of second draw 


FIGURE 2.10 

The paths from the top node to a bottom node correspond to the possible outcomes 
in the drawing of two balls from an urn without replacement. The probability of a 
path is the product of the probabilities in the associated transitions. 
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Example 2.26 Binary Communication System 


Many communication systems can be modeled in the following way. First, the user inputs a 0 or a 1 
into the system, and a corresponding signal is transmitted. Second, the receiver makes a decision 
about what was the input to the system, based on the signal it received. Suppose that the user sends 
Os with probability 1 — p and 1s with probability p, and suppose that the receiver makes random 
decision errors with probability e. For i = 0, 1, let A; be the event “input was i,” and let B; be the 
event “receiver decision was i.” Find the probabilities P[ A; B,] fori = 0,1 andj = 0,1. 

The tree diagram for this experiment is shown in Fig. 2.11. We then readily obtain the de- 
sired probabilities 


[ | 
P[ Ag Bı] = (1 = p)e, 
P(A, M Bo] = pe, and 
P[A N Bı] = p = e) 
Let B,, B2, ..., B, be mutually exclusive events whose union equals the sample 


space S as shown in Fig. 2.12. We refer to these sets as a partition of S. Any event A can 
be represented as the union of mutually exclusive events in the following way: 


A=ANS = AN(B UBU =- UB,) 
= (AN B,)U(ANB,)U-:- U(ANB,). 
(See Fig. 2.12.) By Corollary 4, the probability of A is 
P[A] = PLAN Bı] + P[AN By] +- + P[ANB,]. 


By applying Eq. (2.28a) to each of the terms on the right-hand side, we obtain the 
theorem on total probability: 


P[A] = P[A|B,]P[Bi] + P[A|B,]P[B)] +--+ PLA|B,|P[B,]. (2-29) 


This result is particularly useful when the experiments can be viewed as consist- 
ing of a sequence of two subexperiments as shown in the tree diagram in Fig. 2.10. 


Input into binary channel 


Output from binary channel 


(=pl =E) (1 — p)e pe p(l — e) 


FIGURE 2.11 
Probabilities of input-output pairs in a binary transmission system. 
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FIGURE 2.12 
A partition of S into n disjoint sets. 


Example 2.27 


In the experiment discussed in Example 2.25, find the probability of the event W, that the second 
ball is white. 

The events B, = {(b, b), (b, w)} and W; = {(w, b), (w, w)} form a partition of the sam- 
ple space, so applying Eq. (2.29) we have 


P{W] = P[W,|By]P[ Bi] + P[W:| W1 ]P[W1] 


It is interesting to note that this is the same as the probability of selecting a white ball in the first 
draw. The result makes sense because we are computing the probability of a white ball in the sec- 
ond draw under the assumption that we have no knowledge of the outcome of the first draw. 


Example 2.28 


A manufacturing process produces a mix of “good” memory chips and “bad” memory chips. The 
lifetime of good chips follows the exponential law introduced in Example 2.13, with a rate of fail- 
ure a. The lifetime of bad chips also follows the exponential law, but the rate of failure is 1000qa. 
Suppose that the fraction of good chips is 1 — p and of bad chips, p. Find the probability that a 
randomly selected chip is still functioning after t seconds. 

Let C be the event “chip still functioning after t seconds,” and let G be the event “chip is 
good,” and B the event “chip is bad.” By the theorem on total probability we have 


P[C] = P[C|G]P[G] + P[C|B]P[B] 
= P[C|G](1 — p) + P[CIB]p 
= (1 = pje 4+ pe Mar 


where we used the fact that P[C|G] = e™ and P[C|B] = e 10, 
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Bayes’ Rule 


Let B,, B,..., B, be a partition of a sample space S. Suppose that event A occurs; what 
is the probability of event B;? By the definition of conditional probability we have 


P[ANB\] P[A|B,|P[B] 
P[B;|A] = Pra) , (2.30) 
2 PLAIBe)PL Be] 


where we used the theorem on total probability to replace P[A]. Equation (2.30) is 
called Bayes’ rule. 

Bayes’ rule is often applied in the following situation. We have some random ex- 
periment in which the events of interest form a partition. The “a priori probabilities” of 
these events, P| B;], are the probabilities of the events before the experiment is per- 
formed. Now suppose that the experiment is performed, and we are informed that 
event A occurred; the “a posteriori probabilities” are the probabilities of the events in 


the partition, P[ B;| A], given this additional information. The following two examples 


illustrate this situation. 


Example 2.29 Binary Communication System 


In the binary communication system in Example 2.26, find which input is more probable given 

that the receiver has output a 1. Assume that, a priori, the input is equally likely to be 0 or 1. 
Let A; be the event that the input was k,k = 0, 1, then Ag and A, are a partition of the sample 

space of input-output pairs. Let B, be the event “receiver output was a 1.” The probability of B, is 


P(B,] = P[B,|Ao]|P[Ao] + PL B|Ai]P[A1] 


(roa 


Applying Bayes’ rule, we obtain the a posteriori probabilities 


P[By|Ao]P[Ao]  €/⁄2 


PLA Bi] PLB] “m F 
P(B,|A,]P[A 1 -= 8)/2 
AR plan ait ee 


Thus, if e is less than 1/2, then input 1 is more likely than input 0 when a 1 is observed at the out- 
put of the channel. 


Example 2.30 Quality Control 


Consider the memory chips discussed in Example 2.28. Recall that a fraction p of the chips are 
bad and tend to fail much more quickly than good chips. Suppose that in order to “weed out” 
the bad chips, every chip is tested for t seconds prior to leaving the factory. The chips that fail 
are discarded and the remaining chips are sent out to customers. Find the value of t for which 
99% of the chips sent out to customers are good. 
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Let C be the event “chip still functioning after t seconds,” and let G be the event “chip is 
good,” and B be the event “chip is bad.” The problem requires that we find the value of t for 
which 


P[G|C] = .99. 
We find P[G|C] by applying Bayes’ rule: 
P(CIG]P[G] 
P[C|G]P[G] + P[C|B]P[B] 


(1 -= pe 
(1 = pe 4+ pe er 


P[GIC] = 


1 
= = .99. 
pe #10001 
1+ 


The above equation can then be solved for t: 


fee 1 n( 2>) 
= 399a I- pf” 


For example, if 1/a = 20,000 hours and p = .10, then ¢ = 48 hours. 


2.5 INDEPENDENCE OF EVENTS 


If knowledge of the occurrence of an event B does not alter the probability of some 
other event A, then it would be natural to say that event A is independent of B. In 
terms of probabilities this situation occurs when 


P[ANB] 


P[A] = P[A|B] = PLB) 


The above equation has the problem that the right-hand side is not defined when 
P[B] = 0. 
We will define two events A and B to be independent if 


P[AN B] = P[A]P[B]. (2.31) 
Equation (2.31) then implies both 
P[A|B] = P[A] (2.32a) 
and 
P([B|A] = P[B] (2.32b) 


Note also that Eq. (2.32a) implies Eq. (2.31) when P[ B] # 0 and Eq. (2.32b) implies 
Eq. (2.31) when P[ A] # 0. 
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Example 2.31 


A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls, 
numbered 3 and 4. Let the events A, B, and C be defined as follows: 


A = {(1,5), (2, b)}, “black ball selected”; 
B = {(2, b), (4, w)}, “even-numbered ball selected”; and 
Crd 


(3, w), (4, w)}, “number of ball is greater than 2.” 


Are events A and B independent? Are events A and C independent? 
First, consider events A and B. The probabilities required by Eq. (2.31) are 


and 


Thus 


P[ANB] = : = P[A]P[B], 


and the events A and B are independent. Equation (2.32b) gives more insight into the meaning 
of independence: 


PLAN B] P[{(2, 6) }] 1⁄4 1 


PLAIB) = PIB] PHZ b) (4 w] 12 2 


P[A] P[{(1, b), (2, b)}] 1/2 


These two equations imply that P[ A] = P[A|B] because the proportion of outcomes in S that 
lead to the occurrence of A is equal to the proportion of outcomes in B that lead to A. Thus knowl- 
edge of the occurrence of B does not alter the probability of the occurrence of A. 

Events A and C are not independent since P[AMC] = P[@] = 0so 


P[AIC] = 0 #4 P[A] = 5. 
In fact, A and C are mutually exclusive since A N C = ©, so the occurrence of C implies that A 


has definitely not occurred. 


In general if two events have nonzero probability and are mutually exclusive, 
then they cannot be independent. For suppose they were independent and mutually 
exclusive; then 


0 = P[AN B] = P[A]P[B], 


which implies that at least one of the events must have zero probability. 
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Example 2.32 


Two numbers x and y are selected at random between zero and one. Let the events A, B, and C 
be defined as follows: 


A={x>05}, B={y>05},  andC = {x> y}. 


Are the events A and B independent? Are A and C independent? 
Figure 2.13 shows the regions of the unit square that correspond to the above events. 
Using Eq. (2.32a), we have 
P[ANB] 14 1 


PAIS Pig) ~ m2” 27 Pl 


so events A and B are independent. Again we have that the “proportion” of outcomes in S lead- 
ing to A is equal to the “proportion” in B that lead to A. 
Using Eq. (2.32b), we have 


PIANC] 38 3 1 
PIAIC] = Srey = tg a 27 PIA) 


so events A and C are not independent. Indeed from Fig. 2.13(b) we can see that knowledge of 
the fact that x is greater than y increases the probability that x is greater than 0.5. 


What conditions should three events A, B, and C satisfy in order for them to be 
independent? First, they should be pairwise independent, that is, 


P[AN B] = P[A]P[B], P[ANC] = P[A]P[C], and P[BNC] = P[B]P[C]. 


(a) Events A and B are independent. 


y 
| | 


(b) Events A and C are not independent. 


FIGURE 2.13 
Examples of independent and 
nonindependent events. 
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In addition, knowledge of the joint occurrence of any two, say A and B, should not af- 
fect the probability of the third, that is, 


P[C|AN B] = P[C]. 
In order for this to hold, we must have 


P[AN BNC] 
P[C|AN B] = mane P[C]. 


This in turn implies that we must have 
P[AN BNC] = P[AN BIP[C] = P[A]P[B]P[C], 


where we have used the fact that A and B are pairwise independent. Thus we conclude 
that three events A, B, and C are independent if the probability of the intersection of any 
pair or triplet of events is equal to the product of the probabilities of the individual events. 

The following example shows that if three events are pairwise independent, it 
does not necessarily follow that P. AN BNC] = P[A]P[B]P[C]. 


Example 2.33 


Consider the experiment discussed in Example 2.32 where two numbers are selected at random 
from the unit interval. Let the events B, D, and F be defined as follows: 


The three events are shown in Fig. 2.14. It can be easily verified that any pair of these events is in- 
dependent: 


PBA D] = i = P| B]P[D], 
P[BOAOF] = = P| B]P[F], and 
PDAF] = ; = P[D]P[F]. 
However, the three events are not independent, since BN DM F = Ø, so 
P[BO DNF] = P[O] = 0 + P[B|P[D]P[F] = 7 


In order for a set of n events to be independent, the probability of an event 
should be unchanged when we are given the joint occurrence of any subset of the other 
events. This requirement naturally leads to the following definition of independence. 
The events A,, Az,..., A, are said to be independent if for k = 2,...,n, 


PLAN ALN +++ N Ap] = PLA; JP[A;]--- PLA: ], (2.33) 


Section 2.5 Independence of Events 57 


YA Ya 
1 1 
B 
! D 
2 
> Xx > x 
0 1 0O 1 1 
2 
@B = {y> $) Mp=ne4 
ğ 2 i 2 
Ya 
1 
F 
1 
2 
F 
=x 
o i 1 
2 


1 1 1 1 
= <= y< >a > 
(c) F = {x 7 and y z? {x 7 and y 7} 


FIGURE 2.14 
Events B, D, and F are pairwise independent, but the 
triplet B, D, F are not independent events. 


where 1 S i, <i, < --- <i, S n. For a set of n events we need to verify that the 
probabilities of all 2” — n — 1 possible intersections factor in the right way. 

The above definition of independence appears quite cumbersome because it re- 
quires that so many conditions be verified. However, the most common application of 
the independence concept is in making the assumption that the events of separate ex- 
periments are independent. We refer to such experiments as independent experiments. 
For example, it is common to assume that the outcome of a coin toss is independent of 
the outcomes of all prior and all subsequent coin tosses. 


Example 2.34 


Suppose a fair coin is tossed three times and we observe the resulting sequence of heads and 
tails. Find the probability of the elementary events. 

The sample space of this experiment is S = {HHH, HHT, HTH, THH, TTH, THT, 
HTT, TTT}. The assumption that the coin is fair means that the outcomes of a single toss are 
equiprobable, that is, P[H] = P[T] = 1/2. If we assume that the outcomes of the coin tosses are 
independent, then 


P({HHH} | = P[{H}]PH{H}]PHH}] = 


P({HHT}] = PHH} JPHH}]PIHT}] = 
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PHHTH}] = PHH}JPHT PHH] = 7 
P({THH}] = P[{T}] PHH} ]P{HĦ}] = 5, 
PL{TTH}] = PHT IPHT}IPHH}] = 5, 
P[{THT}] = PHT}IPHH}]P{T}] = 5, 
PHHTT}] = PHHB}IPHTIPHT}] = Š, and 
P[{TTT}] = PL{T} PUT} IPL{T}] = z. 


Example 2.35 System Reliability 


A system consists of a controller and three peripheral units. The system is said to be “up” if the 
controller and at least two of the peripherals are functioning. Find the probability that the sys- 
tem is up, assuming that all components fail independently. 

Define the following events: A is “controller is functioning” and B; is “peripheral i is func- 
tioning” where i = 1, 2, 3. The event F, “two or more peripheral units are functioning,” occurs if 
all three units are functioning or if exactly two units are functioning. Thus 


F = (B1 N B2N BS) U (B1 N B$ N B3) 
U (BÍN B2 N B3) U (B1 N B2N B3). 
Note that the events in the above union are mutually exclusive. Thus 
P[F] = P[B,]P[B2]P[B5] + P[ By] P[B3]P[ Bs] 
+ P[Bi]P[B2]P[B;] + P[B,]P[B,]P[Bs] 


where we have assumed that each peripheral fails with probability a, so that P[B;] = 1 — a and 
P[ BF] = a. 

The event “system is up” is then A N F. If we assume that the controller fails with proba- 
bility p, then 


P| “system up”] = P[ AN F] = P[A]P[F] 
= (1 —- p)P[F] 
= (1 — p){3(1 — a)’a + (1 — a)*}. 


Let a = 10%, then all three peripherals are functioning (1 — a)? = 72.9% of the time and 
two are functioning and one is “down” 3(1 — a)*a = 24.3% of the time. Thus two or more 
peripherals are functioning 97.2% of the time. Suppose that the controller is not very reliable, 
say p = 20%, then the system is up only 77.8% of the time, mostly because of controller 
failures. 

Suppose a second identical controller with p = 20% is added to the system, and that the 
system is “up” if at least one of the controllers is functioning and if two or more of the peripher- 
als are functioning. In Problem 2.94, you are asked to show that at least one of the controllers is 
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functioning 96% of the time, and that the system is up 93.3% of the time. This is an increase of 
16% over the system with a single controller. 


SEQUENTIAL EXPERIMENTS 


Many random experiments can be viewed as sequential experiments that consist of a 
sequence of simpler subexperiments. These subexperiments may or may not be inde- 
pendent. In this section we discuss methods for obtaining the probabilities of events in 
sequential experiments. 


Sequences of Independent Experiments 


Suppose that a random experiment consists of performing experiments F4, Fy,..., En- 
The outcome of this experiment will then be an n-tuple s = (s,,...,5,), where sx is the 
outcome of the kth subexperiment. The sample space of the sequential experiment is 
defined as the set that contains the above n-tuples and is denoted by the Cartesian 
product of the individual sample spaces S; X S2 X ++: X Sy. 

We can usually determine, because of physical considerations, when the subexper- 
iments are independent, in the sense that the outcome of any given subexperiment can- 
not affect the outcomes of the other subexperiments. Let A1, Az,..., A, be events such 
that A, concerns only the outcome of the kth subexperiment. If the subexperiments are 
independent, then it is reasonable to assume that the above events A1, A>,..., A, are 
independent. Thus 


PANAN +++ Ay] = PAJP]... PLA]. (2.34) 


This expression allows us to compute all probabilities of events of the sequential ex- 
periment. 


Example 2.36 


Suppose that 10 numbers are selected at random from the interval [0, 1]. Find the probability 
that the first 5 numbers are less than 1/4 and the last 5 numbers are greater than 1/2. Let 
X1,X,..., X49 be the sequence of 10 numbers, then the events of interest are 


1 

A= fa<} fork = 1,...,5 
1 

a= {n> 3} fork = 6,...,10. 


If we assume that each selection of a number is independent of the other selections, then 


PAN AN +++ N Ajo] = P[A]P[A2]...P{ A0] 
HINS 
i G) (5) 


We will now derive several important models for experiments that consist of se- 
quences of independent subexperiments. 
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The Binomial Probability Law 


A Bernoulli trial involves performing an experiment once and noting whether a partic- 
ular event A occurs. The outcome of the Bernoulli trial is said to be a “success” if A oc- 
curs and a “failure” otherwise. In this section we are interested in finding the 
probability of k successes in n independent repetitions of a Bernoulli trial. 

We can view the outcome of a single Bernoulli trial as the outcome of a toss of a coin 
for which the probability of heads (success) is p = P| A]. The probability of k successes in 
n Bernoulli trials is then equal to the probability of k heads in n tosses of the coin. 


Example 2.37 


Suppose that a coin is tossed three times. If we assume that the tosses are independent and the 
probability of heads is p, then the probability for the sequences of heads and tails is 


P[{HHH}] = P[{H}]P[{H}]P[{H}] = p’, 
[{HHT}] = PHS PE IPT = Pl = p), 
[{HTH}] = PHH}HPHTHPHH} = (1 -= p), 
[{THH}] = PH{T}HPHH}PHH} = p*(1 -= p), 


pý, 


p(1 

[{THT}] = PH{T}PI{H}]PHT}] = pl — p}, 
[{HTT}] = P[{H}]P[{T}]P[{T}] = pl — p}, and 
[{TIT}] = PL{T}JPL{T}JPL{T}] = (1 - py 


where we used the fact that the tosses are independent. Let k be the number of heads in three 
trials, then 


P 
P 
P 
PU{TTH}] = PHTHPHTHPHHH 
P 
P 
P 


P[k = 0] = P[{TIT}] = (1 - py, 

P[k = 1] = P[{TTH, THT, HTT}] = 3p(1 - p}, 

P[k = 2] = P[{HHT, HTH, THH}] = 3p°(1 — p), and 
P[k = 3] = P[{HHH}] = p° 


The result in Example 2.37 is the n = 3 case of the binomial probability law. 


Theorem 


Let k be the number of successes in n independent Bernoulli trials, then the probabilities of k are 
given by the binomial probability law: 


p,(k) = ("Jor — p)" k for  k=0,...,n, (2.35) 
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where p,,(k) is the probability of k successes in n trials, and 


n n! 
(") ~ k(n — k)! eo 


is the binomial coefficient. 


The term a! in Eq. (2.36) is called n factorial and is defined by n! = n(n — 1)... 
(2)(1). By definition 0! is equal to 1. 

We now prove the above theorem. Following Example 2.34 we see that each of 
the sequences with k successes and n — k failures has the same probability, namely 
p*(1 — p)"*. Let N,(k) be the number of distinct sequences that have k successes 
and n — k failures, then 


Pak) = N,(k)p*(1 — py" (2.37) 


The expression N,(k) is the number of ways of picking k positions out of n for the suc- 
cesses. It can be shown that’ 


N,(k) = a (2.38) 


The theorem follows by substituting Eq. (2.38) into Eq. (2.37). 


Example 2.38 


Verify that Eq. (2.35) gives the probabilities found in Example 2.37. 
In Example 2.37, let “toss results in heads” correspond to a “success,” then 


3! 
pa(0)= Grae 0 Py = 0 By, 

3! 1 2 A 
P3(1) = qP 7 PY = 3p(1 - py, 

3! 2 1 2, 
P3(2) = zrg? (1 — P) = 3p"(1 — p), and 


3! 
px(3) = ga P 0 T Py = P, 


which are in agreement with our previous results. 


You were introduced to the binomial coefficient in an introductory calculus 
course when the binomial theorem was discussed: 


(a + b)" = S(t (2.39a) 


5See Example 2.21. 
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If we let a = b = 1, then 


which is in agreement with the fact that there are 2” distinct possible sequences of suc- 
cesses and failures in 7 trials. If we leta = pandb = 1 — pin Eq. (2.39a), we then obtain 


n n z n 
1= X| Ja- p) = > pith), (2.39b) 
o\k k=0 
which confirms that the probabilities of the binomial probabilities sum to 1. 

The term n! grows very quickly with n, so numerical problems are encountered for 
relatively small values of n if one attempts to compute p,(k) directly using Eq. (2.35). 
The following recursive formula avoids the direct evaluation of n! and thus extends the 
range of n for which p,(k) can be computed before encountering numerical difficulties: 


(n— k)p 
(k + 1c = py Pn"): 


Pi(k +1) = (2.40) 
Later in the book, we present two approximations for the binomial probabilities for 
the case when n is large. 


Example 2.39 


Let k be the number of active (nonsilent) speakers in a group of eight noninteracting (i.e., inde- 
pendent) speakers. Suppose that a speaker is active with probability 1/3. Find the probability that 
the number of active speakers is greater than six. 

For i = 1,...,8, let A; denote the event “ith speaker is active.” The number of active 
speakers is then the number of successes in eight Bernoulli trials with p = 1/3. Thus the proba- 
bility that more than six speakers are active is 


P[k = 7] + P[k = 8] = 61919 | (5)G3) 


= 00244 + .00015 = .00259. 


Example 2.40 Error Correction Coding 


A communication system transmits binary information over a channel that introduces random 
bit errors with probability e = 10°. The transmitter transmits each information bit three times, 
and a decoder takes a majority vote of the received bits to decide on what the transmitted bit 
was. Find the probability that the receiver will make an incorrect decision. 

The receiver can correct a single error, but it will make the wrong decision if the channel 
introduces two or more errors. If we view each transmission as a Bernoulli trial in which a “suc- 
cess” corresponds to the introduction of an error, then the probability of two or more errors in 
three Bernoulli trials is 


P[k = 2] = (3) oon 999 + (3001) = 3(10°°). 


2.6.3 


2.6.4 
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The Multinomial Probability Law 


The binomial probability law can be generalized to the case where we note the oc- 
currence of more than one event. Let B,, B2,..., By be a partition of the sample 
space S of some random experiment and let P| Bj] = p;. The events are mutually ex- 
clusive, so 


Dy py th py = As 
Suppose that n independent repetitions of the experiment are performed. Let k; 
be the number of times event B; occurs, then the vector (k;, k2,..., km) specifies the 
number of times each of the events B; occurs. The probability of the vector (k;,..., km) 
satisfies the multinomial probability law: 


n! ky 
Pl (ki, ko,.--,ku)] = EIEL pP P? PM (2.41) 


where k, + k, + +--+ ky = n. The binomial probability law is the M = 2 case of the 
multinomial probability law. The derivation of the multinomial probabilities is identi- 
cal to that of the binomial probabilities. We only need to note that the number of dif- 
ferent sequences with k1, k2,..., km instances of the events B1, B2,..., By is given by 
the multinomial coefficient in Eq. (2.26). 


Example 2.41 


A dart is thrown nine times at a target consisting of three areas. Each throw has a probability of 
.2,.3, and .5 of landing in areas 1, 2, and 3, respectively. Find the probability that the dart lands 
exactly three times in each of the areas. 

This experiment consists of nine independent repetitions of a subexperiment that has 
three possible outcomes. The probability for the number of occurrences of each outcome is given 
by the multinomial probabilities with parameters n = 9 and p, = .2, p) = .3, and p; = .5: 

9! 


P[(3,3,3)] = F A ai (.2)3(.3)3(.5)? = .04536. 


Example 2.42 


Suppose we pick 10 telephone numbers at random from a telephone book and note the last digit in 
each of the numbers. What is the probability that we obtain each of the integers from 0 to 9 only once? 

The probabilities for the number of occurrences of the integers is given by the multinomial 
probabilities with parameters M = 10,n = 10, and p; = 1/10 if we assume that the 10 integers in 
the range 0 to 9 are equiprobable. The probability of obtaining each integer once in 10 draws is then 


10! E 
mi gib” = 3600). 


The Geometric Probability Law 


Consider a sequential experiment in which we repeat independent Bernoulli trials 
until the occurrence of the first success. Let the outcome of this experiment be m, the 
number of trials carried out until the occurrence of the first success. The sample space 
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for this experiment is the set of positive integers. The probability, p(m), that m trials are 
required is found by noting that this can only happen if the first m — 1 trials result in 
failures and the mth trial in success.° The probability of this event is 


p(m) = PLA{A$...Ay-1Am] = (1 -= p)"'p = m=1,2,..., (242a) 


where A; is the event “success in ith trial.” The probability assignment specified by 
Eq. (2.42a) is called the geometric probability law. 
The probabilities in Eq. (2.42a) sum to 1: 


Bos 1 
p(m) = Pa" l= P h (2.42b) 


IVE 


where q = 1 — p, and where we have used the formula for the summation of a geometric 
series. The probability that more than K trials are required before a success occurs has a 
simple form: 


Pi{m>K}])=p > gq"! = p dq 
m=K+1 j=0 
1 
K 
oS aa 
1-q 
ag (2.43) 


Example 2.43 Error Control by Retransmission 


Computer A sends a message to computer B over an unreliable radio link. The message is encoded 
so that B can detect when errors have been introduced into the message during transmission. If B 
detects an error, it requests A to retransmit it. If the probability of a message transmission error is 
q = .1, what is the probability that a message needs to be transmitted more than two times? 

Each transmission of a message is a Bernoulli trial with probability of success p = 1 — q. 
The Bernoulli trials are repeated until the first success (error-free transmission). The probability 
that more than two transmissions are required is given by Eq. (2.43): 


Pim > 2] = q? = 107. 


Sequences of Dependent Experiments 


In this section we consider a sequence or “chain” of subexperiments in which the out- 
come of a given subexperiment determines which subexperiment is performed next. 
We first give a simple example of such an experiment and show how diagrams can be 
used to specify the sample space. 


Example 2.44 


A sequential experiment involves repeatedly drawing a ball from one of two urns, noting the 
number on the ball, and replacing the ball in its urn. Urn 0 contains a ball with the number 1 
and two balls with the number 0, and urn 1 contains five balls with the number 1 and one ball 


°See Example 2.11 in Section 2.2 for a relative frequency interpretation of how the geometric probability law 
comes about. 
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with the number 0. The urn from which the first draw is made is selected at random by flipping 
a fair coin. Urn 0 is used if the outcome is heads and urn 1 if the outcome is tails. Thereafter the 
urn used in a subexperiment corresponds to the number on the ball selected in the previous 
subexperiment. 

The sample space of this experiment consists of sequences of Os and 1s. Each possible se- 
quence corresponds to a path through the “trellis” diagram shown in Fig. 2.15(a). The nodes in 
the diagram denote the urn used in the nth subexperiment, and the labels in the branches denote 
the outcome of a subexperiment. Thus the path 0011 corresponds to the sequence: The coin toss 
was heads so the first draw was from urn 0; the outcome of the first draw was 0, so the second 
draw was from urn 0; the outcome of the second draw was 1, so the third draw was from urn 1; 
and the outcome from the third draw was 1, so the fourth draw is from urn 1. 


Now suppose that we want to compute the probability of a particular sequence of 
outcomes, say so, $1, S2. Denote this probability by P[ {so} N {s1} N {s2}]. Let A = {s2} 
and B = {sọ} N {s1}, then since P| A N B] = P[A|B]P[B] we have 

PL {50} N {51} A {92}] = PL{so}l {so} N {51} PL {50} N {s1}] 
= Pl{so}l{sot N {i} ]Pl{sitl {so} Pl {so}]- 244 


Now note that in the above urn example the probability P[{s,}|{s9} O = A {s,1}] 
depends only on {s„-1} since the most recent outcome determines which subexperi- 
ment is performed: 


PL {Sa} ]{s0} A e N {5p-1}] = PLL sat lsn]. (2.45) 


(a) Each sequence of outcomes corresponds 
to a path through this trellis diagram. 


(b) The probability of a sequence of outcomes is the 
product of the probabilities along the associated path. 


FIGURE 2.15 
Trellis diagram for a Markov chain. 
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Therefore for the sequence of interest we have that 


PL {So} A {1} N {52}] = PHs} {sib PL {531 {50} JPL {50} ]. (2.46) 


Sequential experiments that satisfy Eq. (2.45) are called Markov chains. For these 
experiments, the probability of a sequence sọ, 51,..., Sn is given by 


P85 S513., Sn] = Pl SplSp—1]P[Sn—115n-2] - -- PLs1|80] P[50] (2.47) 


where we have simplified notation by omitting braces. Thus the probability of the se- 
quence So,..., Sn is given by the product of the probability of the first outcome sọ and 
the probabilities of all subsequent transitions, Sọ to s1, 5; to s2, and so on. Chapter 11 
deals with Markov chains. 


Example 2.45 


Find the probability of the sequence 0011 for the urn experiment introduced in Example 2.44. 

Recall that urn 0 contains two balls with label 0 and one ball with label 1, and that urn 1 
contains five balls with label 1 and one ball with label 0. We can readily compute the probabilities 
of sequences of outcomes by labeling the branches in the trellis diagram with the probability of 
the corresponding transition as shown in Fig. 2.15(b). Thus the probability of the sequence 0011 is 
given by 


P[0011] = P[1|1]P[1]0]P[olo}P[o], 


where the transition probabilities are given by 


1 2 
P{1|0] = 3 and = P[O|0] = 3 
5 
P{1|1] = 6 and P[o|1] ==, 
and the initial probabilities are given by 
1 
P(0) = 5 = PI] 


If we substitute these values into the expression for P[0011], we obtain 
5\/1\/2\/1 5 
eae (AAG) 54 


The two-urn experiment in Examples 2.44 and 2.45 is the simplest example of the 
Markov chain models that are discussed in Chapter 11. The two-urn experiment dis- 
cussed here is used to model situations in which there are only two outcomes, and in 
which the outcomes tend to occur in bursts. For example, the two-urn model has been 
used to model the “bursty” behavior of the voice packets generated by a single speak- 
er where bursts of active packets are separated by relatively long periods of silence. 
The model has also been used for the sequence of black and white dots that result from 
scanning a black and white image line by line. 
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A COMPUTER METHOD FOR SYNTHESIZING RANDOMNESS: RANDOM NUMBER 
GENERATORS 


This section introduces the basic method for generating sequences of “random” num- 
bers using a computer. Any computer simulation of a system that involves randomness 
must include a method for generating sequences of random numbers. These random 
numbers must satisfy long-term average properties of the processes they are simulating. 
In this section we focus on the problem of generating random numbers that are “uni- 
formly distributed” in the interval [0, 1]. In the next chapter we will show how these ran- 
dom numbers can be used to generate numbers with arbitrary probability laws. 

The first problem we must confront in generating a random number in the inter- 
val [0, 1] is the fact that there are an uncountably infinite number of points in the in- 
terval, but the computer is limited to representing numbers with finite precision only. 
We must therefore be content with generating equiprobable numbers from some finite 
set, say {0, 1,..., M — 1} or {1,2,..., M}. By dividing these numbers by M, we obtain 
numbers in the unit interval. These numbers can be made increasingly dense in the unit 
interval by making M very large. 

The next step involves finding a mechanism for generating random numbers. The 
direct approach involves performing random experiments. For example, we can gener- 
ate integers in the range 0 to 2” — 1 by flipping a fair coin m times and replacing the 
sequence of heads and tails by Os and 1s to obtain the binary representation of an inte- 
ger. Another example would involve drawing a ball from an urn containing balls num- 
bered 1 to M. Computer simulations involve the generation of long sequences of 
random numbers. If we were to use the above mechanisms to generate random num- 
bers, we would have to perform the experiments a large number of times and store the 
outcomes in computer storage for access by the simulation program. It is clear that this 
approach is cumbersome and quickly becomes impractical. 


Pseudo-Random Number Generation 


The preferred approach for the computer generation of random numbers involves the 
use of recursive formulas that can be implemented easily and quickly. These pseudo- 
random number generators produce a sequence of numbers that appear to be random 
but that in fact repeat after a very long period. The currently preferred pseudo-random 
number generator is the so-called Mersenne Twister, which is based on a matrix linear 
recurrence over a binary field. This algorithm can yield sequences with an extremely 
long period of 2!’ — 1. The Mersenne Twister generates 32-bit integers, so 
M = 2” — 1 in terms of our previous discussion. We obtain a sequence of numbers in 
the unit interval by dividing the 32-bit integers by 2°. The sequence of such numbers 
should be equally distributed over unit cubes of very high dimensionality. The 
Mersenne Twister has been shown to meet this condition up to 632-dimensionality. In 
addition, the algorithm is fast and efficient in terms of storage. 

Software implementations of the Mersenne Twister are widely available and incor- 
porated into numerical packages such as MATLAB® and Octave.’ Both MATLAB and 
Octave provide a means to generate random numbers from the unit interval using the 


™MATLAB® and Octave are interactive computer programs for numerical computations involving matrices. 
MATLAB’ is a commercial product sold by The Mathworks, Inc. Octave is a free, open-source program that is 
mostly compatible with MATLAB in terms of computation. Long [9] provides an introduction to Octave. 
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rand command. The rand (n, m) operator returns an n row by m column matrix with 
elements that are random numbers from the interval [0, 1). This operator is the starting 
point for generating all types of random numbers. 


Example 2.46 Generation of Numbers from the Unit Interval 


First, generate 6 numbers from the unit interval. Next, generate 10,000 numbers from the unit in- 

terval. Plot the histogram and empirical distribution function for the sequence of 10,000 numbers. 
The following command results in the generation of six numbers from the unit interval. 

>rand(1,6) 

ans = 

Columns 1 through 6: 

0.642667 0.147811 0.317465 0.512824 0.710823 0.406724 


The following set of commands will generate 10000 numbers and produce the histogram 
shown in Fig. 2.16. 


>X-rand (10000,1); Return result in a 10,000-element column vector X. 


oe 


>K=0.005:0.01;0.995; 


oe 


Produce column vector K consisting of the mid points 
for 100 bins of width 0.01 in the unit interval. 


oe 


oe 


>Hist (X,K) Produce the desired histogram in Fig 2.16. 


oe 


Plot the proportion of elements in the array X less 
% than or equal to k, where k is an element of K. 


>plot (K,empirical_cdf (K,X) ) 


The empirical cdf is shown in Fig. 2.17. It is evident that the array of random numbers is uni- 
formly distributed in the unit interval. 
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FIGURE 2.16 
Histogram resulting from experiment to generate 10,000 numbers in the unit interval. 
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FIGURE 2.17 
Empirical cdf of experiment that generates 10,000 numbers. 


Simulation of Random Experiments 


MATLAB® and Octave provide functions that are very useful in carrying out numer- 
ical evaluation of probabilities involving the most common distributions. Functions 
are also provided for the generation of random numbers with specific probability dis- 
tributions. In this section we consider Bernoulli trials and binomial distributions. In 
Chapter 3 we consider experiments with discrete sample spaces. 


Example 2.47 Bernoulli Trials and Binomial Probabilities 


First, generate the outcomes of eight Bernoulli trials. Next, generate the outcomes of 100 repeti- 
tions of a random experiment that counts the number of successes in 16 Bernoulli trials with 
probability of success '/;. Plot the histogram of the outcomes in the 100 experiments and compare 
to the binomial probabilities with n = 16 and p = 1/2. 

The following command will generate the outcomes of eight Bernoulli trials, as shown by 
the answer that follows. 


>X=rand(1,8)<0.5; % Generate 1 row of Bernoulli trials with p = 0.5 
x= 
0.1.1.0°0:-0'2.1 


If the number produced by rand for a given Bernoulli trial is less than p = 0.5, then the outcome 
of the Bernoulli trial is 1. 
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Next we show the set of commands to generate the outcomes of 100 repetitions of random 
experiments where each involves 16 Bernoulli trials. 


>X=rand (100,16) <0.5; % Generate 100 rows of 16 Bernoulli trials with 
% p=0.5. 
>Y=sum(X,2); % Add the results of each row to obtain the number of 


% successes in each experiment. Y contains the 100 
% outcomes. 


>K=0:16; 

>Z=empirical_pdf (K,Y)); % Find the relative frequencies of the outcomes in Y. 
>Bar (K, Z) % Produce a bar graph of the relative frequencies. 
>hold on % Retains the graph for next command. 


oe 


>stem(K, binomial_pdf (K,16,0.5)) Plot the binomial probabilities along 


with the corresponding relative frequencies. 


oe 


Figure 2.18 shows that there is good agreement between the relative frequencies and 
the binomial probabilities. 


FINE POINTS: EVENT CLASSES® 


If the sample space S is discrete, then the event class can consist of all subsets of S. 
There are situations where we may wish or are compelled to let the event class F be a 
smaller class of subsets of S. In these situations, only the subsets that belong to this 
class are considered events. In this section we explain how these situations arise. 

Let C be the class of events of interest in a random experiment. It is reasonable to 
expect that any set operation on events in C will produce a set that is also an event in C. 
We can then ask any question regarding events of the random experiment, express it 
using set operations, and obtain an event that is in C. Mathematically, we require that C 
be a field. 

A collection of sets F is called a field if it satisfies the following conditions: 


(i) DeF (2.48a) 
(ii) if Ac F and Be F, then AU Be F (2.48b) 
(iii) if Ae F then Ae F. (2.48c) 


Using DeMorgan’s rule we can show that (ii) and (iii) imply that if Ae F and 
BeF, then AN BeF. Conditions (ii) and (iii) then imply that any finite union or in- 
tersection of events in F will result in an event that is also in F. 


Example 2.48 


Let S = {T, H}. Find the field generated by set operations on the class consisting of elementary 
events of S:C = {{H}, {T}}. 


The “Fine Points” sections elaborate on concepts and distinctions that are not required in an introductory 
course. The material in these sections is not necessarily more mathematical, but rather is not usually covered 
in a first course in probability. 
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FIGURE 2.18 
Relative frequencies from 100 binomial experiments and corresponding binomial probabilities. 


Let F be the class generated by C. First note that {H} U {T} = {H, T} = S, which implies 
that S is in F. Next we find that S° = Ø which implies that Øe F. Any other set operations will 
not yield events that are not already in F. Therefore 


F = {Ø, {H}, {T}, {H, Th} = S. 


Note that we have generated the power set of S and shown that it is a field. 


The above example can be generalized to any finite or countably infinite set S. 
We can generate the power set S by taking all possible unions of elementary events 
and their complements, and S forms a field. Note that in Example 2.1, this includes the 
random experiments F,, E>, E3, E4, and E;. Classical probability deals with finite sam- 
ple spaces and so taking the class of events of interest as the power set is sufficient to pro- 
ceed to the final step in specifying a probability model, namely, to provide a rule for 
assigning probabilities to events. 

The following example shows that in some situations the field F of events of inter- 
est need not include all subsets of the sample space S. In this case only those subsets of S 
that are in F are considered valid events. For this reason, we will restrict the use of the term 
“event” to sets that are in the field F that is associated with a given random experiment. 


Example 2.49 Lisa and Homer's Urn Experiment 


An urn contains three white balls. One ball has a red dot, another ball has a green dot, and the 
third ball has a teal dot. The experiment consists of selecting a ball at random and noting the 
color of the ball. 


72 


Chapter 2 Basic Concepts of Probability Theory 


When Lisa does the experiment, she has sample space S; = {r, g, t}, and her power set 
has 23 = 8 events: 


Sc = {©, {r}, {8}, {1 {5 8}, {5 th {8 th {5 8 t}}- 


When Homer does the experiment, he has a smaller sample space Sy = {R, G} because 
Homer cannot tell green from teal! Homer’s power set has 4 events: 


Su = {O, {R}, {G}, {R, G}}- 


Homer does not understand what the problem is. He can deal with any union, intersection, or 
complement of events in Sy. 

The problem of course is that Lisa is interested in sets that include questions about teal. 
Homer’s class of events Sy cannot handle these questions. Lisa figures out what’s happened as 
follows. She notes that Homer has partitioned Lisa’s sample space Sz as follows (see Fig. 2.19b): 


A= {r} and A, = {g,t}. 


Each event in Homer’s experiment is related to an equivalent event in Lisa’s experiment. 
Every union, complement, or intersection in Homer’s event class corresponds to the union, com- 
plement, or intersection of the corresponding A;,’s in the partition. For example, the event “the 
outcome is R or G” leads to the following: 


{R} U {G} corresponds to A; U A; = {r, g, t}. 
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FIGURE 2.19 
(a) Homer's mapping; (b) Partition of Lisa's sample space; 
(c) Partitioning of a sample space. 
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You can try any combination of unions, intersections, and complements of events in Homer’s 
experiment and the corresponding operations on A, and/or A, will result in events in the field: 


F = {Ø, {r}, {r, g}, {1, g, t}}. 


The field F does not contain all of the events in Lisa’s power set S;. The field F suffices to ad- 
dress events that only involve the outcomes in Sy. Questions that involve distinguishing be- 
tween teal and green lead to subsets of Sz, such as {r, t}, that are not events in F and hence are 
outside the scope of the experiment. 

Lisa explains it all to Homer, and, predictably, his response is “D’oh!” 


The sets in the field F that specify the events of interest are said to be 
measurable. Any subset of S that is not in F is not measurable. In the above exam- 
ple, the set {r, t} is not measurable with respect to F. The situation in the above ex- 
ample occurs very frequently in practice, where a decision is made to restrict the 
scope of questions about a random experiment. Indeed this is part of the modeling 
process! 

In the general case, the sample space S in the original random experiment is divided 
into mutually exclusive events A;,..., A,, where A;N A; = Ofori + jand 


S= A UAU U Ap, 


as shown in Fig. 2.19(c). The collection of events A;,..., A, are said to form a partition 
of S. When the experiment is performed, we observe which event in the partition oc- 
curs and not the specific outcome ¢. All questions (events) that involve unions, inter- 
sections, or complements of the events in the partition can be answered from this 
observation. The events in the partition are like elementary events. We can obtain the 
field F generated by the events in the partition by taking unions of all distinct combi- 
nations of the A,,..., A, and their complements. In this case, the subsets of S that are 
not in F are not measurable and thus are not considered to be events. 


Example 2.50 


In Experiment E; a coin is tossed three times and the sequence of heads and tails is recorded. 
The sample space is $; = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH} and the corre- 
sponding power set S3 has 28 = 256 events: 


S; = {Ø, {TTT}, {TTH}, ..., {HHH}, {TTT, TTH}, ..., {THH, HHH}, ..., 53}. 


In Experiment E; the coin is tossed three times but only the number of heads is recorded. 
The sample space is S, = {0, 1,2, 3} and the corresponding power set S4 has 2* = 16 events: 


s, = 12 {0} {1}, {2}, {3}, (0, 1}, {0,2}, (0, 3}, {1,2}, {1,3}, 
j {2, 3}, {0, 1, 2}, {0, 1, 3}, {0, 2, 3} {1, 2, 3}, S4 ` 
Experiment E; divides the sample space S; into the following partition: 
Ap = {TTT}, A; = {TTH, THT, HTT}, 
A, = {THH, HTH, HHT}, A; = {HHH}. 
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All the events in S4 correspond to unions, intersections, and complements of Ag, A1, A2, and 
A3. The field F generated by unions, intersections, and complements of these four events has 16 
events and addresses all questions associated with Experiment E4. 

We see that the event space is greatly simplified and reduced in size by restricting the 
events of interest to those that only involve the total number of heads and not details about the 
sequence of heads and tails. The simplification is even more marked as we increase the number 
of tosses. For example if we extend E; to 100 coin tosses, then S$; has 21% outcomes, a huge num- 
ber, whereas S, has only 101 outcomes. 


Now suppose that S is countably infinite. For example in Experiment Ee we have 
S = {1,2,...} and we might be interested in the condition “number of transmissions 
is greater than 10.” This condition corresponds to the set {10, 11, 12,...}, which is a 
countable union of elementary sets. It is clear that for events in our class of interest, we 
should now require that a countable union of events should also be an event, that is: 


(i) DeF (2.49a) 

(ii) if Ay A, ... e F then [JAE F (2.49b) 
k=1 

(iii) if Ac F then Ae F. (2.49c) 


A class of sets F that satisfies Eqs. (2.49a)—(2.49c) is called a sigma field. As before, equa- 
tions (ii) and (iii) and DeMorgan’s rule imply that countable intersections of events 
Ng A; are also in F. 

Next consider the case where the sample space S is not countable, as in the 
unit interval in the real line in Experiment £7, or the unit square in the real plane in 
E2. (See Figs. 2.1(a) and (c).) The probability that the outcome of the experiment is 
exactly a single point in Sj, is clearly zero. But this result is not very useful. Instead, 
we can say that the probability of the event “the outcome (x, y) satisfies x > y” is 
1/2, by noting that half of S42 satisfies the condition of the event. Similarly, the prob- 
ability of any event that corresponds to a rectangle within S,) is simply the area of 
the rectangle. Taking the set of events that are rectangles within S, we can build a 
field of events by forming countable unions, intersections, and complements. From 
your previous experience using integrals to calculate areas in the plane, you know 
that we can approximate any reasonable shape, i.e., event, by taking the union of a 
sequence of increasingly fine rectangles as shown in Fig. 2.20(a). Clearly there is a 
strong relationship between calculating integrals, measuring areas, and assigning 
probabilities to events. 

We can finally explain (qualitatively) why we cannot allow all subsets of S to be 
events when the sample space is uncountably infinite. In essence, there are subsets that 
are so irregular (see Fig. 2.20b) that it is impossible to define integrals to measure 
them. We say that these subsets are not measurable. Advanced math is required to 
show this and we will not deal with this any further. The good news is that we can build 
a sigma field from the countable unions, intersections, and complements of intervals in 
R, or rectangles in R? that have well-behaved integrals and to which we can assign 
probabilities. This is familiar territory. In the remainder of this text, we will refer to 
these sigma fields over R and R? as the Borel fields. 
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(a) 


FIGURE 2.20 
IfA CB, then P(A) = P(B). 


FINE POINTS: PROBABILITIES OF SEQUENCES OF EVENTS 


In this optional section, we discuss the Borel field in more detail and show how se- 
quences of intervals can generate many events of practical interest. We then present a re- 
sult on the continuity of the probability function for a sequence of events. We show how 
this result is applied to find the probability of the limit of a sequence of Borel events. 


The Borel Field of Events 
Let S be the real line R. Consider events that are semi-infinite intervals of the real line: 
(—œ,b] = {x:-00 < x = b}. 


We are interested in the Borel field 6, which is the sigma field generated by countable 
unions, countable intersections and complements of events of the form (— 00, b]. We 
will show that events of the following form are also in B: 


(a, b), [a, b], (a, b], [a, b), [a, œ), (a, 00), (~00, b), {b}. 
Since (—0©o, b] e B, then its complement is in B: 
(-œ,b] = (b, ©) EB. 
The following intersection must then be in B: 
(a, 00) ™M(—0co,b] = (a,b] for a< b. 


We claim for now that (— 0°, b) e B. Then the following complements and intersections 
are also in B: 


(=œ, b)* = [b, œ) and (a, 00) N (~œ, b) = (a,b) fora < b, 
[a, CO) N (—0o0, b] = [a,b] and [a, ©) N (œ, b) = [a, b) fora < b, 
and [b, 0) N (=œ, b] = {b}. 


Furthermore, 6 contains all complements, countable unions, and intersections of events 
of the above forms. Note in particular that 6 contains all singleton sets (elementary 
events) {b} and therefore all the events for discrete and countable sample spaces of 
real numbers. 
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Let’s prove the above claim that (— ©, b) e B. By definition, all events of the form 
(—œ, b] e B. Consider the sequence of events A, = (—%,b — I/n] = {x:-% < x =s 
b — 1/n}. Note that the A, are an increasing sequence, that is, A, C A„+1. All A, € B, so 
their countable union is also in B by Eq. (2.49b): 


UA, = 


n=1 n= 


{x:-00 <x S b — 1/n} = (—œ%, b). 
1 


We claim that this countable union is equal to (—°%, b). To show equality of the two 
rightmost sets, first assume that x e U a A,, We can find a sufficiently large index n 
so that x < b — 1/n < b (that is, x is strictly less than b), which implies that 
xe(—00, b). Thus we have shown that U,,_;A, C (—œ, b). 

Now assume that x e(—©o,b), then x < b. We can therefore find an integer 
my such that x < b — 1/no < b, so xe Am and so x eU A„. Thus (—00, b) 
C UZA, We conclude that U;",A, = (—œ, b). Therefore (—°°, b) e B. 


Continuity of Probability 


Axiom III’ provides the key property that allows us to assign probabilities to events 
through the addition of the probabilities of mutually exclusive events. In this section 
we present two consequences of the Axiom III’ that are very useful in finding the 
probabilities of sequences of events. 

Let A,, Az,... be a sequence of events from a sigma field, such that, 


ACA Co CA 


The sequence is said to be an increasing sequence of events. For example, the sequence 
of intervals [a,b — 1/n] with a < b — 1 is an increasing sequence. The sequence 
(—n, a] is also increasing. We define the limit of an increasing sequence as the union of 
all the events in the sequence: 


lim A, = (JA. 
=? 99 n=1 


The union contains all elements of all events in the sequence and no other elements. 
Note that the countable union of events is also in the sigma field. 
We say that the sequence A,, A>,... is a decreasing sequence of events if 


Ag Ack, Aes 


For example, the sequence of intervals (a — 1/n,a + 1/n) is a decreasing sequence, as 
is the sequence (—00, a + 1/n]. We define the limit of a decreasing sequence as the in- 
tersection of all the events in the sequence: 


lim A, = (An. 
N= OO n=1 
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The intersection contains all elements that are in all the events of the sequence and no 


other elements. If all the events in the sequence are in a sigma field, then the countable 
intersection will also be in the sigma field. 


Corollary 8 Continuity of Probability Function 


Let A,, Az,... be an increasing or decreasing sequences of events in F, then: 
lim P[A,,] = P[ lim A,,]. (2.50) 


We first show how the continuity result is applied in problems that involve events from the 
Borel field. 


Example 2.51 


Find an expression for the probabilities of the following sequences of events from the Borel 
field: [a,b — 1/n], (—n, a], (a — 1/n,a + 1/n),(—œ,a + 1/n]. 


jim Pl {xia <x <b- 1/n}] = P[ lim {x:a < x <b- 1/n}] = Pl{x:a £ x < b}]. 

Jm P[{x: -n < x S a}] = P[ lim {x: =n < x S a}] = P[{x:-%œ < x < a}]. 

Jm P[{x:a — 1/n< x< a+ 1/n}] = P[ lim {x:a — 1/n < x <a + 1/n}] = P[{x = a}]. 
Jim P[{x: -œ < x <a + 1/n}] = P[ lim {x: -œ < x <a + 1/n}] 


= P[{x:-% < x = a}]. 


To prove the continuity property for an increasing sequence of events, form the 
following sequence of mutually exclusive events: 


B, = A,, B, = A, — Aj,...,B, = Án — Áp (2.51a) 


The event B,, contains the set of outcomes in A, not already present in A,, Az,... An—1 
as illustrated in Fig. 2.21, so it is easy to show that B; N B, = © and that 


UB; = UA; forn = 1,2,... (2.51b) 
j=l j=l 
as well as 
j=l j=l 


Since the sequence is expanding, we also have that: 


A, = UA; (2.51d) 
j=l 
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FIGURE 2.21 
Increasing sequence of events. 


The proof of continuity applies Axiom III’ to Eq (2.51c): 
P[UA,] = P[UB;] = > PIB 
j=l j=l j=l 
We express the summation as a limit and apply Axiom II: 


> P[B,) = = lim $ PLB, = lim P| ‘Bi. 


nR] j= 1 


Finally we use Eqs. (2.51b) and (2.51d): 


lim PLB) = lim PIA] = lim P[A,]. 
j MTOR S EN 


This proves continuity for increasing sequences: 


lim P[ A - »(Ua.] = = P[ lim An]. 


noo 


For decreasing sequences, we note that the sequence of complements of the de- 
creasing sequences is an increasing sequence. We therefore apply the continuity result 
to the complement of the decreasing sequence A: 


PILAS] = lim PLAS]. (2.52a) 
j=l n—-oo 
Next we apply DeMorgan’s rule: 


(Ua) = Quan A 


Summary 79 


and Corollary 1 to obtain: 


We now use Eq. (2.52a): 


f= PINA; = PIJA = lim P| AS] = lim(1 - P[A,]) 
j=l j=1 HRR 


which gives the desired result: 


SUMMARY 


P|QA;] = lim [A,]. (2.52b) 


A probability model is specified by identifying the sample space S, the event class 
of interest, and an initial probability assignment, a “probability law,” from which 
the probability of all events can be computed. 

The sample space S specifies the set of all possible outcomes. If it has a finite or 
countable number of elements, S is discrete; S is continuous otherwise. 


Events are subsets of S that result from specifying conditions that are of interest 
in the particular experiment. When S is discrete, events consist of the union of el- 
ementary events. When S is continuous, events consist of the union or intersec- 
tion of intervals in the real line. 

The axioms of probability specify a set of properties that must be satisfied by the 
probabilities of events. The corollaries that follow from the axioms provide rules 
for computing the probabilities of events in terms of the probabilities of other re- 
lated events. 

An initial probability assignment that specifies the probability of certain events 
must be determined as part of the modeling. If S is discrete, it suffices to specify 
the probabilities of the elementary events. If S is continuous, it suffices to specify 
the probabilities of intervals or of semi-infinite intervals. 


Combinatorial formulas are used to evaluate probabilities in experiments that 
have an equiprobable, finite number of outcomes. 

A conditional probability quantifies the effect of partial knowledge about the 
outcome of an experiment on the probabilities of events. It is particularly useful 
in sequential experiments where the outcomes of subexperiments constitute the 
“partial knowledge.” 

Bayes’ rule gives the a posteriori probability of an event given that another event 
has been observed. It can be used to synthesize decision rules that attempt to de- 
termine the most probable “cause” in light of an observation. 

Two events are independent if knowledge of the occurrence of one does not alter 
the probability of the other. Two experiments are independent if all of their re- 
spective events are independent. The notion of independence is useful for com- 
puting probabilities in experiments that involve noninteracting subexperiments. 
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Many experiments can be viewed as consisting of a sequence of independent 
subexperiments. In this chapter we presented the binomial, the multinomial, and 
the geometric probability laws as models that arise in this context. 

A Markov chain consists of a sequence of subexperiments in which the outcome 
of a subexperiment determines which subexperiment is performed next. The 
probability of a sequence of outcomes in a Markov chain is given by the product 
of the probability of the first outcome and the probabilities of all subsequent 
transitions. 

Computer simulation models use recursive equations to generate sequences of 
pseudo-random numbers. 


CHECKLIST OF IMPORTANT TERMS 


Axioms of Probability Independent experiments 
Bayes’ rule Initial probability assignment 
Bernoulli trial Markov chain 

Binomial coefficient Mutually exclusive events 
Binomial theorem Null event 

Certain event Outcome 

Conditional probability Partition 

Continuous sample space Probability law 

Discrete sample space Sample space 

Elementary event Set operations 


Event 
Event class 


Theorem on total probability 
Tree diagram 


Independent events 
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Section 2.1: Specifying Random Experiments 


2.1. 


2.2. 


2.3. 


2.4. 


2.5. 


The (loose) minute hand in a clock is spun hard and the hour at which the hand comes to 
rest is noted. 


(a) What is the sample space? 


(b) Find the sets corresponding to the events: A = “hand is in first 4 hours”; B = “hand 
is between 2nd and 8th hours inclusive”; and D = “hand is in an odd hour.” 


(c) Find the events: AN BN D, Æ N B, AU (BA D’), (AU B) A Ds. 

A die is tossed twice and the number of dots facing up in each toss is counted and noted 
in the order of occurrence. 

(a) Find the sample space. 


(b) Find the set A corresponding to the event “number of dots in first toss is not less than 
number of dots in second toss.” 


(c) Find the set B corresponding to the event “number of dots in first toss is 6.” 

(d) Does A imply B or does B imply A? 

(e) Find A N B° and describe this event in words. 

(f) Let C correspond to the event “number of dots in dice differs by 2.” Find A N C. 


Two dice are tossed and the magnitude of the difference in the number of dots facing up 

in the two dice is noted. 

(a) Find the sample space. 

(b) Find the set A corresponding to the event “magnitude of difference is 3.” 

(c) Express each of the elementary events in this experiment as the union of elementary 
events from Problem 2.2. 


A binary communication system transmits a signal X that is either a +2 voltage signal 
or a —2 voltage signal. A malicious channel reduces the magnitude of the received 
signal by the number of heads it counts in two tosses of a coin. Let Y be the resulting 
signal. 


(a) Find the sample space. 


(b) Find the set of outcomes corresponding to the event “transmitted signal was defi- 
nitely +2.” 


(c) Describe in words the event corresponding to the outcome Y = 0. 
A desk drawer contains six pens, four of which are dry. 


(a) The pens are selected at random one by one until a good pen is found. The sequence 
of test results is noted. What is the sample space? 
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2.6. 


2.7. 


2.8. 


2.9. 


2.10. 


2.11. 


2.12. 
2.13. 


2.14. 


2.15. 
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(b) Suppose that only the number, and not the sequence, of pens tested in part a is noted. 
Specify the sample space. 


(c) Suppose that the pens are selected one by one and tested until both good pens have 
been identified, and the sequence of test results is noted. What is the sample space? 


(d) Specify the sample space in part c if only the number of pens tested is noted. 


Three friends (Al, Bob, and Chris) put their names in a hat and each draws a name from 
the hat. (Assume Al picks first, then Bob, then Chris.) 


(a) Find the sample space. 


(b) Find the sets A, B, and C that correspond to the events “Al draws his name,” “Bob 
draws his name,” and “Chris draws his name.” 


(c) Find the set corresponding to the event, “no one draws his own name.” 

(d) Find the set corresponding to the event, “everyone draws his own name.” 

(e) Find the set corresponding to the event, “one or more draws his own name.” 

Let M be the number of message transmissions in Experiment Es. 

(a) What is the set A corresponding to the event “M is even”? 

(b) What is the set B corresponding to the event “M is a multiple of 3”? 

(c) What is the set C corresponding to the event “6 or fewer transmissions are re- 
quired”? 

(d) Find the sets AN B, A — B, AN BMC and describe the corresponding events in 
words. 


A number U is selected at random from the unit interval. Let the events A and B be: 
A = “U differs from 1/2 by more than 1/4” and B = “1 — U is less than 1/2.” Find the 
events AN B, ÆA N B, AUB. 


The sample space of an experiment is the real line. Let the events A and B correspond to 
the following subsets of the real line: A = (—00, r] and B = (—o0, s], where r = s. Find 
an expression for the event C = (r, s] in terms of A and B. Show that B = AUC and 
ANC = Ø. 

Use Venn diagrams to verify the set identities given in Eqs. (2.2) and (2.3). You will need 
to use different colors or different shadings to denote the various regions clearly. 

Show that: 

(a) If event A implies B, and B implies C, then A implies C. 

(b) If event A implies B, then B° implies A*. 

Show that if AU B = Aand AM B = Athen A = B. 


Let A and B be events. Find an expression for the event “exactly one of the events A and 
B occurs.” Draw a Venn diagram for this event. 


Let A, B, and C be events. Find expressions for the following events: 
(a) Exactly one of the three events occurs. 

(b) Exactly two of the events occur. 

(c) One or more of the events occur. 

(d) Two or more of the events occur. 

(e) None of the events occur. 


Figure P2.1 shows three systems of three components, C,, C2, and C3. Figure P2.1(a) is a 
“series” system in which the system is functioning only if all three components are func- 
tioning. Figure 2.1(b) is a “parallel” system in which the system is functioning as long as 
at least one of the three components is functioning. Figure 2.1(c) is a “two-out-of-three” 
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system in which the system is functioning as long as at least two components are func- 
tioning. Let A, be the event “component k is functioning.” For each of the three system 
configurations, express the event “system is functioning” in terms of the events Ax. 


Ci | Cı kd Ca 
e Ci o Cy o C3 o o Cy . e Cı © C3 © 
C3 | Cy o C3 
(a) Series system (b) Parallel system (c) Two-out-of-three system 


FIGURE P2.1 


2.16. A system has two key subsystems. The system is “up” if both of its subsystems are func- 
tioning. Triple redundant systems are configured to provide high reliability. The overall 
system is operational as long as one of three systems is “up.” Let Aj, correspond to the 
event “unit k in system j is functioning,” for j = 1,2,3 and k = 1,2. 

(a) Write an expression for the event “overall system is up.” 


(b) Explain why the above problem is equivalent to the problem of having a connection 
in the network of switches shown in Fig. P2.2. 


A Ay 


x. Xa 


FIGURE P2.2 


2.17. In a specified 6-AM-to-6-AM 24-hour period, a student wakes up at time f¢; and goes to 
sleep at some later time fn. 


(a) Find the sample space and sketch it on the x-y plane if the outcome of this experi- 
ment consists of the pair (t, t2). 


(b) Specify the set A and sketch the region on the plane corresponding to the event “stu- 
dent is asleep at noon.” 


(c) Specify the set B and sketch the region on the plane corresponding to the event “stu- 
dent sleeps through breakfast (7-9 Am).” 


(d) Sketch the region corresponding to A N B and describe the corresponding event in 
words. 
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2.18. 


2.19. 


2.20. 


Basic Concepts of Probability Theory 


A road crosses a railroad track at the top of a steep hill. The train cannot stop for oncoming 

cars and cars, cannot see the train until it is too late. Suppose a train begins crossing the road 

at time ¢, and that the car begins crossing the track at time t, where 0 < tı < Tand0<t,<T. 

(a) Find the sample space of this experiment. 

(b) Suppose that it takes the train d; seconds to cross the road and it takes the car d, sec- 
onds to cross the track. Find the set that corresponds to a collision taking place. 

(c) Find the set that corresponds to a collision is missed by 1 second or less. 

A random experiment has sample space S$ = {—1,0, +1}. 

(a) Find all the subsets of S. 

(b) The outcome of a random experiment consists of pairs of outcomes from S where the 
elements of the pair cannot be equal. Find the sample space S’ of this experiment. 
How many subsets does S’ have? 

(a) A coin is tossed twice and the sequence of heads and tails is noted. Let S be the sam- 
ple space of this experiment. Find all subsets of S. 

(b) A coin is tossed twice and the number of heads is noted. Let S? be the sample space 
of this experiment. Find all subsets of S’. 

(c) Consider parts a and b if the coin is tossed 10 times. How many subsets do S and 
S’ have? How many bits are needed to assign a binary number to each possible 
subset? 


Section 2.2: The Axioms of Probability 


2.21. 


2.22. 


2.23. 


2.24. 


2.25. 


2.26. 


2.27. 


A die is tossed and the number of dots facing up is noted. 


(a) Find the probability of the elementary events under the assumption that all faces of 
the die are equally likely to be facing up after a toss. 


(b) Find the probability of the events: A = {more than 3 dots}; B = {odd number 
of dots}. 
(c) Find the probability of A U B, AN B, Æ. 
In Problem 2.2, a die is tossed twice and the number of dots facing up in each toss is 
counted and noted in the order of occurrence. 
(a) Find the probabilities of the elementary events. 
(b) Find the probabilities of events A, B,C, A N B®, and AMC defined in Problem 2.2. 
A random experiment has sample space S = {a, b, c, d}. Suppose that P[{c, d}] = 3/8, 
P[{b,c}] = 6/8, and P[{d}] = 1/8, P[{c,d}] = 3/8. Use the axioms of probability to 
find the probabilities of the elementary events. 
Find the probabilities of the following events in terms of P[A], P[B], and P[ A N B]: 
(a) A occurs and B does not occur; B occurs and A does not occur. 
(b) Exactly one of A or B occurs. 
(c) Neither A nor B occur. 
Let the events A and B have P[ A] = x, P[B] = y, and P[ AU B] = z. Use Venn dia- 
grams to find P[ A N B], P[ A N B°], P[ A U B°], P[AN B°], P[ A’ U B]. 
Show that 
P[AUBUC] = P[A] + P[B] + P[C] — P[AN B] - P[ANC] — P[BNC] 
+ PIANBNC]. 
Use the argument from Problem 2.26 to prove Corollary 6 by induction. 


2.28. 


2.29. 


2.30. 


2.31. 


2.32. 


2.33. 


2.34. 


2.35. 
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A hexadecimal character consists of a group of three bits. Let A; be the event “ith bit in a 
character is a 1.” 


(a) Find the probabilities for the following events: A1, A1 N 43, A1 N A2 N A3 and 
A,U A, U A3. Assume that the values of bits are determined by tosses of a fair coin. 


(b) Repeat part a if the coin is biased. 


Let M be the number of message transmissions in Problem 2.7. Find the probabilities of 
the events A, B, C, C°, AN B, A — B, AN BMC. Assume the probability of successful 
transmission is 1/2. 


Use Corollary 7 to prove the following: 
(a) PL[AUBUC] < P[A] + P[B] + P[C]. 


b) P| Ja < Š PLA. 
k=l = 


n 


(c) P| Aa =1- X Pl Agi: 
k=1 k=1 
The second expression is called the union bound. 


Let p be the probability that a single character appears incorrectly in this book. Use the 
union bound for the probability of there being any errors in a page with n characters. 


A die is tossed and the number of dots facing up is noted. 


(a) Find the probability of the elementary events if faces with an even number of dots 
are twice as likely to come up as faces with an odd number. 


(b) Repeat parts b and c of Problem 2.21. 


Consider Problem 2.1 where the minute hand in a clock is spun. Suppose that we now 
note the minute at which the hand comes to rest. 


(a) Suppose that the minute hand is very loose so the hand is equally likely to come to 
rest anywhere in the clock. What are the probabilities of the elementary events? 


(b) Now suppose that the minute hand is somewhat sticky and so the hand is 1/2 as like- 
ly to land in the second minute than in the first, 1/3 as likely to land in the third 
minute as in the first, and so on. What are the probabilities of the elementary events? 


(c) Now suppose that the minute hand is very sticky and so the hand is 1/2 as likely to 
land in the second minute than in the first, 1/2 as likely to land in the third minute as 
in the second, and so on. What are the probabilities of the elementary events? 

(d) Compare the probabilities that the hand lands in the last minute in parts a, b, and c. 

A number x is selected at random in the interval [—1, 2]. Let the events A = {x < 0}, 

B = {|x — 0.5| < 0.5}, and C = {x > 0.75}. 

(a) Find the probabilities of A, B, A N B, and A N C. 


(b) Find the probabilities of A U B, AUC, and AU BUC, first, by directly evaluating 
the sets and then their probabilities, and second, by using the appropriate axioms or 
corollaries. 


A number x is selected at random in the interval [—1, 2]. Numbers from the subinterval 
[0,2] occur half as frequently as those from [—1, 0). 


(a) Find the probability assignment for an interval completely within [—1, 0); complete- 
ly within [0, 2]; and partly in each of the above intervals. 


(b) Repeat Problem 2.34 with this probability assignment. 
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The lifetime of a device behaves according to the probability law P[ (t, œ0)] = 1/t fort > 1. 
Let A be the event “lifetime is greater than 4,” and B the event “lifetime is greater than 8.” 


(a) Find the probability of A N B, and A U B. 
(b) Find the probability of the event “lifetime is greater than 6 but less than or equal to 12.” 


Consider an experiment for which the sample space is the real line. A probability law as- 
signs probabilities to subsets of the form (— 09, r]. 


(a) Show that we must have P[(—œ,r]] = P[(—~, s]] whenr < s. 

(b) Find an expression for P[(r, s]] in terms of P[(—©°, r]] and P[(—%, s]] 
(c) Find an expression for P[(s, 0) ]. 

Two numbers (x, y) are selected at random from the interval [0, 1]. 

(a) Find the probability that the pair of numbers are inside the unit circle. 
(b) Find the probability that y > 2x. 


*Section 2.3: Computing Probabilities Using Counting Methods 


2.39. 


2.40. 


2.41. 


2.42. 


2.43. 


2.44. 


2.45. 


2.46. 


2.47. 


The combination to a lock is given by three numbers from the set {0, 1,..., 59}. Find the 

number of combinations possible. 

How many seven-digit telephone numbers are possible if the first number is not allowed 

to be 0 or 1? 

A pair of dice is tossed, a coin is flipped twice, and a card is selected at random from a 

deck of 52 distinct cards. Find the number of possible outcomes. 

A lock has two buttons: a “0” button and a “1” button. To open a door you need to push 

the buttons according to a preset 8-bit sequence. How many sequences are there? Sup- 

pose you press an arbitrary 8-bit sequence; what is the probability that the door opens? If 

the first try does not succeed in opening the door, you try another number; what is the 

probability of success? 

A Web site requires that users create a password with the following specifications: 

e Length of 8 to 10 characters 

e Includes at least one special character {!, @, #, $, %,^, & *,(,),+,=,{4,},1,.<, >, 

UES g [, J ?} 

e No spaces 

e May contain numbers (0-9), lower and upper case letters (a-z, A-Z) 

e Is case-sensitive. 

How many passwords are there? How long would it take to try all passwords if a pass- 

word can be tested in 1 microsecond? 

A multiple choice test has 10 questions with 3 choices each. How many ways are there to 

answer the test? What is the probability that two papers have the same answers? 

A student has five different t-shirts and three pairs of jeans (“brand new,” “broken in,” 

and “perfect”). 

(a) How many days can the student dress without repeating the combination of jeans 
and t-shirt? 

(b) How many days can the student dress without repeating the combination of jeans 
and t-shirt and without wearing the same t-shirt on two consecutive days? 

Ordering a “deluxe” pizza means you have four choices from 15 available toppings. How 

many combinations are possible if toppings can be repeated? If they cannot be repeated? 

Assume that the order in which the toppings are selected does not matter. 

A lecture room has 60 seats. In how many ways can 45 students occupy the seats in the 

room? 
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List all possible permutations of two distinct objects; three distinct objects; four distinct 

objects. Verify that the number is n!. 

A toddler pulls three volumes of an encyclopedia from a bookshelf and, after being scold- 

ed, places them back in random order. What is the probability that the books are in the 

correct order? 

Five balls are placed at random in five buckets. What is the probability that each bucket 

has a ball? 

List all possible combinations of two objects from two distinct objects; three distinct ob- 

jects; four distinct objects. Verify that the number is given by the binomial coefficient. 

A dinner party is attended by four men and four women. How many unique ways can the 

eight people sit around the table? How many unique ways can the people sit around the 

table with men and women alternating seats? 

A hot dog vendor provides onions, relish, mustard, ketchup, Dijon ketchup, and hot pep- 

pers for your hot dog. How many variations of hot dogs are possible using one condi- 

ment? Two condiments? None, some, or all of the condiments? 

A lot of 100 items contains k defective items. M items are chosen at random and tested. 

(a) What is the probability that m are found defective? This is called the hypergeometric 
distribution. 

(b) A lot is accepted if 1 or fewer of the M items are defective. What is the probability 
that the lot is accepted? 

A park has N raccoons of which eight were previously captured and tagged. Suppose that 

20 raccoons are captured. Find the probability that four of these are found to be tagged. 

Denote this probability, which depends on N, by p(N). Find the value of N that maximizes 

this probability. Hint: Compare the ratio p(N)/p(N — 1) to unity. 

A lot of 50 items has 40 good items and 10 bad items. 

(a) Suppose we test five samples from the lot, with replacement. Let X be the number of 
defective items in the sample. Find P[X = k]. 

(b) Suppose we test five samples from the lot, without replacement. Let Y be the number 
of defective items in the sample. Find P[Y = k]. 

How many distinct permutations are there of four red balls, two white balls, and three 

black balls? 

A hockey team has 6 forwards, 4 defensemen, and 2 goalies. At any time, 3 forwards, 2 de- 

fensemen, and 1 goalie can be on the ice. How many combinations of players can a coach 

put on the ice? 

Find the probability that in a class of 28 students exactly four were born in each of the 


seven days of the week. 
n\ _ n 
k n-k 


Show that 

In this problem we derive the multinomial coefficient. Suppose we partition a set of n dis- 

tinct objects into J subsets B4, B2,..., By of size k1,..., kj, respectively, where k; = 0, 

and kı + ka + ... tkp=n. 

(a) Let N; denote the number of possible outcomes when the ith subset is selected. 
Show that 


n n- k n—-—k,-— e —kj-2 
N = N, = we, Ny = ‘ 
i: @ 2 ( ky ) 9 tNJ-1 ( kj 
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(b) Show that the number of partitions is then: 


n! 


Ni N2... N; = ka! kal... ky! 


Section 2.4: Conditional Probability 


2.62. 


2.63. 
2.64. 
2.65. 
2.66. 


2.67. 
2.68. 


2.69. 


2.70. 


2.71. 


2.72. 


2.73. 


2.74. 


2.75. 
2.76. 


A die is tossed twice and the number of dots facing up is counted and noted in the order 

of occurrence. Let A be the event “number of dots in first toss is not less than number of 

dots in second toss,” and let B be the event “number of dots in first toss is 6.” Find P[ A| B] 

and P[B|A]. 

Use conditional probabilities and tree diagrams to find the probabilities for the elemen- 

tary events in the random experiments defined in parts a to d of Problem 2.5. 

In Problem 2.6 (name in hat), find P[B N C| A] and P[C|AN B]. 

In Problem 2.29 (message transmissions), find P[B|A] and P[A|B]. 

In Problem 2.8 (unit interval), find P[ B| A] and P[ A|B]. 

In Problem 2.36 (device lifetime), find P[B|A] and P[A|B]. 

In Problem 2.33, let A = {hand rests in last 10 minutes} and B = {hand rests in last 

5 minutes}. Find P[ B| A] for parts a, b, and c. 

A number x is selected at random in the interval [—1, 2]. Let the events A = {x < 0}, 

B = {|x — 0.5] < 0.5}, and C = {x > 0.75}. Find P[ A|B], P[ BIC], PL AIC], P[ BIC‘]. 

In Problem 2.36, let A be the event “lifetime is greater than t,” and B the event “lifetime 

is greater than 2t.” Find P[B|A]. Does the answer depend on t? Comment. 

Find the probability that two or more students in a class of 20 students have the same 

birthday. Hint: Use Corollary 1. How big should the class be so that the probability that 

two or more students have the same birthday is 1/2? 

A cryptographic hash takes a message as input and produces a fixed-length string as out- 

put, called the digital fingerprint. A brute force attack involves computing the hash for a 

large number of messages until a pair of distinct messages with the same hash is found. 

Find the number of attempts required so that the probability of obtaining a match is 1/2. 

How many attempts are required to find a matching pair if the digital fingerprint is 64 bits 

long? 128 bits long? 

(a) Find P[A|B] if AN B = Ø; if AC B; if AD B. 

(b) Show that if P[A|B] > P[A], then P[B|A] > P[B]. 

Show that P[ A|B] satisfies the axioms of probability. 

(i) 0< P[A|B] <1 
(ii) P[S|B] =1 
(iii) If ANC = Ø, then P[ AU C|B] = P[A|B] + P[C|B]. 

Show that PLAN BNC] = P[A|BNC]P[BIC]P[C]. 

In each lot of 100 items, two items are tested, and the lot is rejected if either of the tested 

items is found defective. 

(a) Find the probability that a lot with k defective items is accepted. 

(b) Suppose that when the production process malfunctions, 50 out of 100 items are de- 
fective. In order to identify when the process is malfunctioning, how many items 
should be tested so that the probability that one or more items are found defective is 
at least 99%? 
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2.77. A nonsymmetric binary communications channel is shown in Fig. P2.3. Assume the input 


2.78. 


2.79. 


2.80. 


2.81. 


is “0” with probability p and “1” with probability 1 — p. 
(a) Find the probability that the output is 0. 
(b) Find the probability that the input was 0 given that the output is 1. Find the 


probability that the input is 1 given that the output is 1. Which input is more 
probable? 


FIGURE P2.3 


The transmitter in Problem 2.4 is equally likely to send X = +2 as X = —2. The mali- 

cious channel counts the number of heads in two tosses of a fair coin to decide by how 

much to reduce the magnitude of the input to produce the output Y. 

(a) Use a tree diagram to find the set of possible input-output pairs. 

(b) Find the probabilities of the input-output pairs. 

(c) Find the probabilities of the output values. 

(d) Find the probability that the input was X = +2 given that Y = k. 

One of two coins is selected at random and tossed three times. The first coin comes up 

heads with probability pı and the second coin with probability p) = 2/3 > p, = 1/3. 

(a) What is the probability that the number of heads is k? 

(b) Find the probability that coin 1 was tossed given that k heads were observed, for 
k = 0,1, 2,3. 

(c) In part b, which coin is more probable when k heads have been observed? 

(d) Generalize the solution in part b to the case where the selected coin is tossed m times. 
In particular, find a threshold value T such that when k > T heads are observed, coin 
1 is more probable, and when k < T are observed, coin 2 is more probable. 

(e) Suppose that p, = 1 (that is, coin 2 is two-headed) and 0 < pı < 1. What is the 
probability that we do not determine with certainty whether the coin is 1 or 2? 

A computer manufacturer uses chips from three sources. Chips from sources A, B, and C 

are defective with probabilities .005, .001, and .010, respectively. If a randomly selected 

chip is found to be defective, find the probability that the manufacturer was A; that the 

manufacturer was C. Assume that the proportions of chips from A, B, and C are 0.5, 0.1, 

and 0.4, respectively. 

A ternary communication system is shown in Fig. P2.4. Suppose that input symbols 0, 1, 

and 2 occur with probability 1/3 respectively. 

(a) Find the probabilities of the output symbols. 


(b) Suppose that a 1 is observed at the output. What is the probability that the input was 
0? 1? 2? 
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FIGURE P2.4 


Section 2.5: Independence of Events 
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2.91. 


2.92. 


2.93. 


2.94. 


Let S = {1,2,3,4} and A = {1,2}, B = {1,3}, C = {1,4}. Assume the outcomes are 
equiprobable. Are A, B, and C independent events? 

Let U be selected at random from the unit interval. Let A = {0 < U < 1/2}, 
B = {1/4 < U < 3/4}, and C = {1/2 < U < 1}. Are any of these events independent? 
Alice and Mary practice free throws at the basketball court after school. Alice makes free 
throws with probability p, and Mary makes them with probability p„. Find the probabil- 
ity of the following outcomes when Alice and Mary each take one shot: Alice scores a 
basket; Either Alice or Mary scores a basket; both score; both miss. 

Show that if A and B are independent events, then the pairs A and B‘, Æ and B, and 4° 
and B“ are also independent. 

Show that events A and B are independent if P[ A|B] = P[A|B°]. 

Let A, B, and C be events with probabilities P[A], P[B], and P[C]. 

(a) Find P[A U B] if A and B are independent. 

(b) Find P[A U B] if A and B are mutually exclusive. 

(c) Find P[AU BUC] if A, B, and C are independent. 

(d) Find P[AU BUC] if A, B, and C are pairwise mutually exclusive. 

An experiment consists of picking one of two urns at random and then selecting a ball 
from the urn and noting its color (black or white). Let A be the event “urn 1 is selected” 
and B the event “a black ball is observed.” Under what conditions are A and B inde- 
pendent? 

Find the probabilities in Problem 2.14 assuming that events A, B, and C are independent. 
Find the probabilities that the three types of systems are “up” in Problem 2.15. As- 
sume that all units in the system fail independently and that a type k unit fails with 
probability px. 

Find the probabilities that the system is “up” in Problem 2.16. Assume that all units in the 
system fail independently and that a type k unit fails with probability px. 

A random experiment is repeated a large number of times and the occurrence of events 
A and B is noted. How would you test whether events A and B are independent? 
Consider a very long sequence of hexadecimal characters. How would you test whether 
the relative frequencies of the four bits in the hex characters are consistent with indepen- 
dent tosses of coin? 

Compute the probability of the system in Example 2.35 being “up” when a second con- 
troller is added to the system. 


2.95. 
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In the binary communication system in Example 2.26, find the value of e for which the 
input of the channel is independent of the output of the channel. Can such a channel be 
used to transmit information? 

In the ternary communication system in Problem 2.81, is there a choice of e for which the 
input of the channel is independent of the output of the channel? 


Section 2.6: Sequential Experiments 
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2.99. 


2.100. 


2.101. 


2.102. 


2.103. 


A block of 100 bits is transmitted over a binary communication channel with probability 

of bit error p = 10°. 

(a) Ifthe block has 1 or fewer errors then the receiver accepts the block. Find the prob- 
ability that the block is accepted. 


(b) Ifthe block has more than 1 error, then the block is retransmitted. Find the probabil- 
ity that M retransmissions are required. 

A fraction p of items from a certain production line is defective. 

(a) What is the probability that there is more than one defective item in a batch of n 
items? 

(b) During normal production p = 10° but when production malfunctions p = 1071. 
Find the size of a batch that should be tested so that if any items are found defective 
we are 99% sure that there is a production malfunction. 


A student needs eight chips of a certain type to build a circuit. It is known that 5% of 
these chips are defective. How many chips should he buy for there to be a greater than 
90% probability of having enough chips for the circuit? 


Each of n terminals broadcasts a message in a given time slot with probability p. 


(a) Find the probability that exactly one terminal transmits so the message is received by 
all terminals without collision. 


(b) Find the value of p that maximizes the probability of successful transmission in part a. 


(c) Find the asymptotic value of the probability of successful transmission as n becomes 
large. 


A system contains eight chips. The lifetime of each chip has a Weibull probability law: 
with parameters A and k = 2: P[(t, 0©)] = e ”” for t = 0. Find the probability that at 
least two chips are functioning after 2/A seconds. 


A machine makes errors in a certain operation with probability p. There are two types of 

errors. The fraction of errors that are type 1 is a, and type 2 is 1 — a. 

(a) What is the probability of k errors in n operations? 

(b) What is the probability of k; type 1 errors in n operations? 

(c) What is the probability of kz type 2 errors in n operations? 

(d) What is the joint probability of kı and k, type 1 and 2 errors, respectively, in n opera- 
tions? 

Three types of packets arrive at a router port. Ten percent of the packets are “expedited 

forwarding (EF),” 30 percent are “assured forwarding (AF),” and 60 percent are “best ef- 

fort (BE).” 

(a) Find the probability that k of N packets are not expedited forwarding. 

(b) Suppose that packets arrive one at a time. Find the probability that k packets are 
received before an expedited forwarding packet arrives. 


(c) Find the probability that out of 20 packets, 4 are EF packets, 6 are AF packets, and 10 
are BE. 
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2.104. A run-length coder segments a binary information sequence into strings that consist of 
either a “run” of k “zeros” punctuated by a “one”, for k = 0,...,m — 1, or a string of m 
“zeros.” The m = 3 case is: 


String Run-length k 


1 0 
01 1 
001 2 
000 3 


Suppose that the information is produced by a sequence of Bernoulli trials with 
P[“one”] = P[success] = p. 

(a) Find the probability of run-length k in the m = 3 case. 

(b) Find the probability of run-length k for general m. 


2.105. The amount of time cars are parked in a parking lot follows a geometric probability law 
with p = 1/2. The charge for parking in the lot is $1 for each half-hour or less. 


(a) Find the probability that a car pays k dollars. 


(b) Suppose that there is a maximum charge of $6. Find the probability that a car pays k 
dollars. 


2.106. A biased coin is tossed repeatedly until heads has come up three times. Find the proba- 
bility that k tosses are required. Hint: Show that {“k tosses are required” } = AN B, 
where A = {“kth toss is heads” } and B = {“2 heads occurs in k — 1 tosses” }. 


2.107. An urn initially contains two black balls and two white balls. The following experiment is 
repeated indefinitely: A ball is drawn from the urn; if the color of the ball is the same as 
the majority of balls remaining in the urn, then the ball is put back in the urn. Otherwise 
the ball is left out. 


(a) Draw the trellis diagram for this experiment and label the branches by the transition 
probabilities. 


(b) Find the probabilities for all sequences of outcomes of length 2 and length 3. 


(c) Find the probability that the urn contains no black balls after three draws; no white 
balls after three draws. 


(d) Find the probability that the urn contains two black balls after n trials; two white 
balls after n trials. 


2.108. In Example 2.45, let pọ(n) and p;(n) be the probabilities that urn 0 or urn 1 is used in the 
nth subexperiment. 


(a) Find po(1) and p,(1). 

(b) Express po(m + 1) and p;(m + 1) in terms of pọ(n) and p;(n). 

(c) Evaluate po(n) and p,(n) for n = 2, 3, 4. 

(d) Find the solution to the recursion in part b with the initial conditions given in part a. 
(e) What are the urn probabilities as n approaches infinity? 


*Section 2.7: Synthesizing Randomness: Number Generators 


2.109. An urn experiment is to be used to simulate a random experiment with sample 
space S$ = {1,2,3,4,5} and probabilities pı = 1/3, p = 1/5, p, = 1/4, ps = 1/7, and 
ps = 1 — (pi + pP + p + py). How many balls should the urn contain? Generalize 


2.110. 


2.111. 


2.112. 


2.113. 


2.114. 
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the result to show that an urn experiment can be used to simulate any random ex- 
periment with finite sample space and with probabilities given by rational numbers. 


Suppose we are interested in using tosses of a fair coin to simulate a random experiment 
in which there are six equally likely outcomes, where S$ = {0, 1, 2, 3, 4, 5}. The following 
version of the “rejection method” is proposed: 


1. Toss a fair coin three times and obtain a binary number by identifying heads with 
zero and tails with one. 


2. Ifthe outcome of the coin tosses in step 1 is the binary representation for a num- 
ber in S, output the number. Otherwise, return to step 1. 


(a) Find the probability that a number is produced in step 2. 
(b) Show that the numbers that are produced in step 2 are equiprobable. 


(c) Generalize the above algorithm to show how coin tossing can be used to simulate 
any random urn experiment. 


Use the rand function in Octave to generate 1000 pairs of numbers in the unit square. 
Plot an x-y scattergram to confirm that the resulting points are uniformly distributed in 
the unit square. 


Apply the rejection method introduced above to generate points that are uniformly dis- 
tributed in the x > y portion of the unit square. Use the rand function to generate a pair 
of numbers in the unit square. If x > y, accept the number. If not, select another pair. 
Plot an x-y scattergram for the pair of accepted numbers and confirm that the resulting 
points are uniformly distributed in the x > y region of the unit square. 


The sample mean-squared value of the numerical outcomes X(1), X(2),... X(n) of a se- 
ries of n repetitions of an experiment is defined by 


(a) What would you expect this expression to converge to as the number of repetitions n 
becomes very large? 


(b) Find a recursion formula for (X7),, similar to the one found in Problem 1.9. 


The sample variance is defined as the mean-squared value of the variation of the samples 
about the sample mean 


(V2) = DAXO) = (Xa 


Note that the (X’),, also depends on the sample values. (It is customary to replace the n in 
the denominator with n — 1 for technical reasons that will be discussed in Chapter 8. For 
now we will use the above definition.) 


(a) Show that the sample variance satisfies the following expression: 
(V7) = (Xn — (X) 


(b) Show that the sample variance satisfies the following recursion formula: 


= (1-2) + Ht = EJ = a. 


n 


with (V7)y = 0. 
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Suppose you have a program to generate a sequence of numbers U, that is uniformly dis- 
tributed in [0,1]. Let Y, = aU, + B. 
(a) Find a and £ so that Y, is uniformly distributed in the interval [a, b]. 


(b) Let a = —5 and b = 15. Use Octave to generate Y, and to compute the sample mean 
and sample variance in 1000 repetitions. Compare the sample mean and sample vari- 
ance to (a + b)/2 and (b — a)/12, respectively. 


Use Octave to simulate 100 repetitions of the random experiment where a coin is tossed 
16 times and the number of heads is counted. 


(a) Confirm that your results are similar to those in Figure 2.18. 
(b) Rerun the experiment with p = 0.25 and p = 0.75. Are the results as expected? 


*Section 2.8: Fine Points: Event Classes 


2.117. 


2.118. 


2.119. 


In Example 2.49, Homer maps the outcomes from Lisa’s sample space Sz = {r, g, t} into 
a smaller sample space Sy = {R, G}:f(r) = R, f(g) = G, and f(t) = G. 
Define the inverse image events as follows: 


FOUR} = Ar = fr} and f({G}) = Ap = {g t}. 


Let A and B be events in Homer’s sample space. 

(a) Show that f(A U B) = f(A) Uf {(B). 

(b) Show that f (AM B) = f(A) Mf 1(B). 

(c) Show that f(A) = f(A). 

(d) Show that the results in parts a, b, and c hold for a general mapping f from a sample 
space S toa set S’. 

Let fbe a mapping from a sample space S to a finite set S’ = {y,, yo,.--, Yn}. 

(a) Show that the set of inverse images A, = f '({y,}) forms a partition of S. 

(b) Show that any event B of S’ can be related to a union of A;’s. 

Let A be any subset of S. Show that the class of sets {@, A, A’, S} is a field. 


*Section 2.9: Fine Points: Probabilities of Sequences of Events 


2.120. 


2.121. 


2.122. 


2.123. 
2.124. 


Find the countable union of the following sequences of events: 

(a) A, = [a + 1/n,b — 1/n]. 

(b) B, = (—n,b — 1/n]. 

(© C, = [a + 1/n, b). 

Find the countable intersection of the following sequences of events: 
(a) A, = (a — 1/n,b + In). 

(b) B, = [a,b + In). 

(c) C, = (a — 1/n,b]. 


(a) Show that the Borel field can be generated from the complements and countable 
intersections and unions of open sets (a, b). 


(b) Suggest other classes of sets that can generate the Borel field. 
Find expressions for the probabilities of the events in Problem 2.120. 
Find expressions for the probabilities of the events in Problem 2.121. 
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Problems Requiring Cumulative Knowledge 


2.125. 


2.126. 


2.127. 


2.128. 


Compare the binomial probability law and the hypergeometric law introduced in Prob- 

lem 2.54 as follows. 

(a) Suppose a lot has 20 items of which five are defective. A batch of ten items is tested 
without replacement. Find the probability that k are found defective for k = 0,..., 10. 
Compare this to the binomial probabilities with n = 10 and p = 5/20 = .25. 

(b) Repeat but with a lot of 1000 items of which 250 are defective. A batch of ten items is 
tested without replacement. Find the probability that k are found defective for 
k = 0,..., 10. Compare this to the binomial probabilities with n = 10 and p = 5/20 
= .25. 

Suppose that in Example 2.43, computer A sends each message to computer B simulta- 

neously over two unreliable radio links. Computer B can detect when errors have oc- 

curred in either link. Let the probability of message transmission error in link 1 and link 

2 be qı and q respectively. Computer B requests retransmissions until it receives an 

error-free message on either link. 

(a) Find the probability that more than k transmissions are required. 

(b) Find the probability that in the last transmission, the message on link 2 is received 
free of errors. 

In order for a circuit board to work, seven identical chips must be in working order. To 

improve reliability, an additional chip is included in the board, and the design allows it to 

replace any of the seven other chips when they fail. 

(a) Find the probability p, that the board is working in terms of the probability p that an 
individual chip is working. 

(b) Suppose that n circuit boards are operated in parallel, and that we require a 99.9% 
probability that at least one board is working. How many boards are needed? 

Consider a well-shuffled deck of cards consisting of 52 distinct cards, of which four are 

aces and four are kings. 

(a) Find the probability of obtaining an ace in the first draw. 

(b) Draw a card from the deck and look at it. What is the probability of obtaining an 
ace in the second draw? Does the answer change if you had not observed the first 
draw? 

(c) Suppose we draw seven cards from the deck. What is the probability that the seven 
cards include three aces? What is the probability that the seven cards include two 
kings? What is the probability that the seven cards include three aces and/or two 
kings? 

(d) Suppose that the entire deck of cards is distributed equally among four players. What 
is the probability that each player gets an ace? 


Discrete Random 
Variables 


3.1 
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CHAPTER 


In most random experiments we are interested in a numerical attribute of the outcome 
of the experiment. A random variable is defined as a function that assigns a numerical 
value to the outcome of the experiment. In this chapter we introduce the concept of a 
random variable and methods for calculating probabilities of events involving a ran- 
dom variable. We focus on the simplest case, that of discrete random variables, and in- 
troduce the probability mass function. We define the expected value of a random 
variable and relate it to our intuitive notion of an average. We also introduce the con- 
ditional probability mass function for the case where we are given partial information 
about the random variable. These concepts and their extension in Chapter 4 provide us 
with the tools to evaluate the probabilities and averages of interest in the design of sys- 
tems involving randomness. 

Throughout the chapter we introduce important random variables and discuss 
typical applications where they arise. We also present methods for generating random 
variables. These methods are used in computer simulation models that predict the be- 
havior and performance of complex modern systems. 


THE NOTION OF A RANDOM VARIABLE 


The outcome of a random experiment need not be a number. However, we are usually 
interested not in the outcome itself, but rather in some measurement or numerical at- 
tribute of the outcome. For example, in n tosses of a coin, we may be interested in the 
total number of heads and not in the specific order in which heads and tails occur. In a 
randomly selected Web document, we may be interested only in the length of the doc- 
ument. In each of these examples, a measurement assigns a numerical value to the out- 
come of the random experiment. Since the outcomes are random, the results of the 
measurements will also be random. Hence it makes sense to talk about the probabili- 
ties of the resulting numerical values. The concept of a random variable formalizes this 
notion. 

A random variable X is a function that assigns a real number, X (¢), to each out- 
come ¢ in the sample space of a random experiment. Recall that a function is simply a 
rule for assigning a numerical value to each element of a set, as shown pictorially in 
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A random variable assigns a number X(¢) to each outcome ¢ in the 
sample space S of a random experiment. 


Fig. 3.1. The specification of a measurement on the outcome of a random experiment 
defines a function on the sample space, and hence a random variable. The sample space 
Sis the domain of the random variable, and the set Sy of all values taken on by X is the 
range of the random variable. Thus Sy is a subset of the set of all real numbers. We will 
use the following notation: capital letters denote random variables, e.g., X or Y, and 
lower case letters denote possible values of the random variables, e.g., x or y. 


Example 3.1 Coin Tosses 


A coin is tossed three times and the sequence of heads and tails is noted. The sample space for this 
experiment is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. Let X be the number of 
heads in the three tosses. X assigns each outcome ¢ in S a number from the set Sy = {0, 1, 2,3}. 
The table below lists the eight outcomes of S and the corresponding values of X. 


é: HHH HHT HTH THA HIT THT TTH TTT 


X4): 3 2 2 2 1 1 1 0 


X is then a random variable taking on values in the set Sy = {0, 1, 2, 3}. 


Example 3.2 A Betting Game 


A player pays $1.50 to play the following game: A coin is tossed three times and the number of 
heads X is counted. The player receives $1 if X = 2 and $8 if X = 3, but nothing otherwise. Let 
Y be the reward to the player. Y is a function of the random variable X and its outcomes can be 
related back to the sample space of the underlying random experiment as follows: 


£ HHH HHT HTH THH HTT THT TTH TIT 
X4): 3 2 2 2 1 1 1 0 
Y(2): 8 1 1 1 0 0 0 0 


Y is then a random variable taking on values in the set Sy = {0, 1, 8}. 
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The above example shows that a function of a random variable produces another 
random variable. 

For random variables, the function or rule that assigns values to each outcome is 
fixed and deterministic, as, for example, in the rule “count the total number of dots fac- 
ing up in the toss of two dice.” The randomness in the experiment is complete as soon 
as the toss is done. The process of counting the dots facing up is deterministic. There- 
fore the distribution of the values of a random variable X is determined by the proba- 
bilities of the outcomes ¢ in the random experiment. In other words, the randomness in 
the observed values of X is induced by the underlying random experiment, and we 
should therefore be able to compute the probabilities of the observed values of X in 
terms of the probabilities of the underlying outcomes. 


Example 3.3 Coin Tosses and Betting 


Let X be the number of heads in three independent tosses of a fair coin. Find the probability of 
the event {X = 2}. Find the probability that the player in Example 3.2 wins $8. 
Note that X(¢) = 2 if and only if ¢ is in {HHT, HTH, THH}. Therefore 


P| X = 2] = P[{HHT, HTH, HHT} |] 
= P{{HHT}] + P[{HTH}] + P[{HHT}] 
= 3/8. 
The event {Y = 8} occurs if and only if the outcome ¢ is HHH, therefore 
P[Y = 8] = P| {HHH}] = 1/8. 


Example 3.3 illustrates a general technique for finding the probabilities of events 
involving the random variable X. Let the underlying random experiment have sample 
space S and event class F. To find the probability of a subset B of R, e.g., B = {xp}, we 
need to find the outcomes in S that are mapped to B, that is, 


A= {f:X(£) eB} (3.1) 


as shown in Fig. 3.2. If event A occurs then X(Z) e B, so event B occurs. Conversely, if 
event B occurs, then the value X(£) implies that ¢ is in A, so event A occurs. Thus the 
probability that X is in B is given by: 


P[X €B] = P[A] = P[{¢: X(0) € B}. (3.2) 
S 
a real 
* line 
B 
FIGURE 3.2 
P[X in B] = P[¢ inA] 


*3.1.1 


3.2 


Section 3.2 Discrete Random Variables and Probability Mass Function 99 


We refer to A and B as equivalent events. 

In some random experiments the outcome ¢ is already the numerical value we 
are interested in. In such cases we simply let X(Z) = @, that is, the identity function, to 
obtain a random variable. 


Fine Point: Formal Definition of a Random Variable 


In going from Eq. (3.1) to Eq. (3.2) we actually need to check that the event A is in F, 
because only events in F have probabilities assigned to them. The formal definition of 
a random variable in Chapter 4 will explicitly state this requirement. 

If the event class F consists of all subsets of S, then the set A will always be in F, 
and any function from S to R will be a random variable. However, if the event class F 
does not consist of all subsets of S, then some functions from S to R may not be random 
variables, as illustrated by the following example. 


Example 3.4 A Function That Is Not a Random Variable 


This example shows why the definition of a random variable requires that we check that the set 
A is in F. An urn contains three balls. One ball is electronically coded with a label 00. Another 
ball is coded with 01, and the third ball has a 10 label. The sample space for this experiment is 
S = {00, 01, 10}. Let the event class F consist of all unions, intersections, and complements of 
the events A, = {00,10} and A, = {01}. In this event class, the outcome 00 cannot be distin- 
guished from the outcome 10. For example, this could result from a faulty label reader that can- 
not distinguish between 00 and 10. The event class has four events F = {@, {00, 10}, {01}, 
{00, 01, 10}}. Let the probability assignment for the events in F be P[{00,10}] = 2/3 and 
P[{01}] = 1/3. 

Consider the following function X from S to R: X(00) = 0, X(01) = 1, X(10)= 2. To 
find the probability of {X = 0}, we need the probability of {Z: X(¢) = 0}= {00}. However, 
{00} is not in the class F, and so X is not a random variable because we cannot determine the 
probability that X = 0. 


DISCRETE RANDOM VARIABLES AND PROBABILITY MASS FUNCTION 


A discrete random variable X is defined as a random variable that assumes values from 
a countable set, that is, Sy = {x1, x2, x3,... }. A discrete random variable is said to be 
finite if its range is finite, that is, Sy = {x,, x2,...,X,}. We are interested in finding the 
probabilities of events involving a discrete random variable X. Since the sample space Sx 
is discrete, we only need to obtain the probabilities for the events A, = {f: X(¢) = xk} 
in the underlying random experiment. The probabilities of all events involving X can be 
found from the probabilities of the A;’s. 

The probability mass function (pmf) of a discrete random variable X is de- 
fined as: 


Dx(x) = P[X = x] = P[{¢: X() = x}] for x a real number. (3.3) 


Note that py(x) is a function of x over the real line, and that py(x) can be nonzero 
only at the values x1, x2, x3,.... For x, in Sy, we have py(x;,) = P[A,]. 
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FIGURE 3.3 
Partition of sample space S associated with a discrete random variable. 


The events A,, A>,... form a partition of S as illustrated in Fig. 3.3. To see this, 
we first show that the events are disjoint. Let j # k, then 


A;N Ay = {f: X() = xj and X(¢) = x} = Ø 


since each ¢ is mapped into one and only one value in Sy. Next we show that S is the 
union of the A,;’s. Every ¢ in S is mapped into some x, so that every ¢ belongs to an 
event A, in the partition. Therefore: 


S=A,UA,U.... 


All events involving the random variable X can be expressed as the union of 
events A,’s. For example, suppose we are interested in the event X in B = {xy, xs}, 
then 


P[X in B] = P[{¢: X(Z) = x2} U {¢: X(L) = xs} ] 
= P| A, U As] = P[A,] + P[As] 
= px(2) + px(5). 


The pmf py(x) satisfies three properties that provide all the information re- 
quired to calculate probabilities for events involving the discrete random variable X: 


(i) px(x) = O for all x (3.4a) 
Gi) >) px(x) = Spx) = XPA] = 1 (3.4b) 
xES y all k all k 
(iii) P[X in B] = > px(x) where BC Sy. (3.4c) 
xeB 


Property (i) is true because the pmf values are defined as a probability, py(x) = 
P| X= x]. Property (ii) follows because the events A, = {X = x,} form a partition 
of S. Note that the summations in Eqs. (3.4b) and (3.4c) will have a finite or infinite 
number of terms depending on whether the random variable is finite or not. Next con- 
sider property (iii). Any event B involving X is the union of elementary events, so by 
Axiom III’ we have: 


P[X in B] = P[{é: X (H) = x}] = XPIX = x] = $ px(x) 


xeB xeB xeB 
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The pmf of X gives us the probabilities for all the elementary events from Sy. 
The probability of any subset of Sy is obtained from the sum of the corresponding ele- 
mentary events. In fact we have everything required to specify a probability law for the 
outcomes in Sy. If we are only interested in events concerning X, then we can forget 
about the underlying random experiment and its associated probability law and just 
work with Sy and the pmf of X. 


Example 3.5 Coin Tosses and Binomial Random Variable 


Let X be the number of heads in three independent tosses of a coin. Find the pmf of X. 
Proceeding as in Example 3.3, we find: 


Po = P[X = 0] = P[{TTT}] = (1 - p)’, 

pı = P[X = 1] = P[{ĦHTT}] + P[{THT}] + P[{TTH}] = 3(1 py, 
P = P[X = 2] = P[{HHT}] + P[{HTH}] + P[{THH}] = 3(1 — p)p’, 
ps = P[X = 3] = P{HHHĦ}] = p 


Example 3.6 A Betting Game 


A player receives $1 if the number of heads in three coin tosses is 2, $8 if the number is 3, but 
nothing otherwise. Find the pmf of the reward Y. 


py(0) = P[ć e {TTT, TTH, THT, HTT}] = 4/8 = 1/2 
py(1) = P[ć e {THH, HTH, HHT}] = 3/8 
py(8) = P[ć e {HHH}] = 1/8. 


Note that py(0) + py(1) + py(8) = 1. 


Figures 3.4(a) and (b) show the graph of px(x) versus x for the random variables 
in Examples 3.5 and 3.6, respectively. In general, the graph of the pmf of a discrete ran- 
dom variable has vertical arrows of height py(x,) at the values x, in Sy. We may view 
the total probability as one unit of mass and py(x) as the amount of probability mass 
that is placed at each of the discrete points x1, x.,....The relative values of pmf at dif- 
ferent points give an indication of the relative likelihoods of occurrence. 


Example 3.7 Random Number Generator 


A random number generator produces an integer number X that is equally likely to be any ele- 
ment in the set Sy = {0,1,2,..., M — 1}. Find the pmf of X. 
For each k in Sy, we have py(k) = 1/M. Note that 


Px(0) + px(1) + ... + px(M - 1) = 1. 
We call X the uniform random variable in the set {0,1,..., M — 1}. 
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FIGURE 3.4 
(a) Graph of pmf in three coin tosses; (b) Graph of pmf in betting game. 


Example 3.8 Bernoulli Random Variable 


Let A be an event of interest in some random experiment, e.g., a device is not defective. We 
say that a “success” occurs if A occurs when we perform the experiment. The Bernoulli ran- 
dom variable J, is equal to 1 if A occurs and zero otherwise, and is given by the indicator 
function for A: 


JO if ¢ not in A 
fale) = f if čin A. (3.5a) 


Find the pmf of I4. 
I,(€) is a finite discrete random variable with values from S; = {0, 1}, with pmf: 


pi(0) = Pi{g:fe A}] =1—p 
pil) = PL{g: fe A}] = p. (3.5b) 
We call 74 the Bernoulli random variable. Note that p;(1) + p;(2) = 1. 


Example 3.9 Message Transmissions 


Let X be the number of times a message needs to be transmitted until it arrives correctly at its 
destination. Find the pmf of X. Find the probability that X is an even number. 

X is a discrete random variable taking on values from Sy = {1,2,3,...}. The event 
{X = k} occurs if the underlying experiment finds k — 1 consecutive erroneous transmissions 


0.14 
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(“failures”) followed by a error-free one (“success”): 
px(k) = P[X = k] = P[00...01] = (1 — p)* pag '*p k= 1,2,..;. (3.6) 
We call X the geometric random variable, and we say that X is geometrically distributed. In 
Eq. (2.42b), we saw that the sum of the geometric probabilities is 1. 
1 1 
hag 1+q 


P[X is even] = 2 px(2k) = pLa =p 


Example 3.10 Transmission Errors 


A binary communications channel introduces a bit error in a transmission with probability p. Let 
X be the number of errors in n independent transmissions. Find the pmf of X. Find the probabil- 
ity of one or fewer errors. 

X takes on values in the set Sy = {0, 1,..., n}. Each transmission results in a “0” if there is 
no error and a “1” if there is an error, P[“1”] = p and P[“0”] = 1 — p. The probability of k errors 
in n bit transmissions is given by the probability of an error pattern that has k 1’s and n — k 0’s: 


px(k) = PĪX = k] = ("Jor — py* k=0,1,...,n. (3.7) 


We call X the binomial random variable, with parameters n and p. In Eq. (2.39b), we saw that the 
sum of the binomial probabilities is 1. 


n 


P[X =1)= ("Jove — p)" + (ra p)”'=(1- p)" 


= 
a 

aN 
v 


Finally, let’s consider the relationship between relative frequencies and the pmf 
Px(x,). Suppose we perform n independent repetitions to obtain n observations of 
the discrete random variable X. Let N;(n) be the number of times the event X = x, 
occurs and let f(n) = N,(n)/n be the corresponding relative frequency. As n be- 
comes large we expect that f,(n) > px(x,). Therefore the graph of relative frequen- 
cies should approach the graph of the pmf. Figure 3.5(a) shows the graph of relative 
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FIGURE 3.5 
(a) Relative frequencies and corresponding uniform pmf, (b) Relative frequencies and corresponding geometric pmf. 
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frequencies for 1000 repetitions of an experiment that generates a uniform random 
variable from the set {0,1,..., 7} and the corresponding pmf. Figure 3.5(b) shows the 
graph of relative frequencies and pmf for a geometric random variable with p = 1/2 
and n = 1000 repetitions. In both cases we see that the graph of relative frequencies 
approaches that of the pmf. 


EXPECTED VALUE AND MOMENTS OF DISCRETE RANDOM VARIABLE 


In order to completely describe the behavior of a discrete random variable, an entire 
function, namely py(x), must be given. In some situations we are interested in a few 
parameters that summarize the information provided by the pmf. For example, Fig. 3.6 
shows the results of many repetitions of an experiment that produces two random vari- 
ables. The random variable Y varies about the value 0, whereas the random variable X 
varies around the value 5. It is also clear that X is more spread out than Y. In this sec- 
tion we introduce parameters that quantify these properties. 
The expected value or mean of a discrete random variable X is defined by 


my = E[ X] = > xpx(x) = Di XKPx(%): (3.8) 


xeSy 


The expected value EX] is defined if the above sum converges absolutely, that is, 
E(IX|} = X lxrlpx(xx) < ©. (3.9) 
k 


There are random variables for which Eq. (3.9) does not converge. In such cases, we say 
that the expected value does not exist. 
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FIGURE 3.6 

The graphs show 150 repetitions of the experiments yielding X and Y. It is clear 
that X is centered about the value 5 while Y is centered about 0. It is also clear that 
X is more spread out than Y. 
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If we view px(x) as the distribution of mass on the points x1, x2,... in the real 
line, then ELX] represents the center of mass of this distribution. For example, in Fig. 
3.5(a), we can see that the pmf of a discrete random variable that is uniformly distrib- 
uted in {0,...,7} has a center of mass at 3.5. 


Example 3.11 Mean of Bernoulli Random Variable 


Find the expected value of the Bernoulli random variable 14. 
From Example 3.8, we have 


E[I4] = 0p:(0) + 1p;(1) = p. 


where p is the probability of success in the Bernoulli trial. 


Example 3.12 Three Coin Tosses and Binomial Random Variable 


Let X be the number of heads in three tosses of a fair coin. Find E[X]. 
Equation (3.8) and the pmf of X that was found in Example 3.5 gives: 


nixie Sir = o(2) +1(2) +2(2) +3(2) 215, 


Note that the above is the n = 3, p = 1/2 case of a binomial random variable, which we will see 
has E[ X] = np. 


Example 3.13 Mean of a Uniform Discrete Random Variable 


Let X be the random number generator in Example 3.7. Find E[X]. 
From Example 3.5 we have py(j) = 1/M for j = 0,..., M — 1,so 


E| X Sk l 1 0+1+4+2 M-1 Cae 
Ex] fy M va Tease ** } 2M 2 
where we used the fact that 1 + 2 + --- + L = (L + 1)L/2. Note that for M = 8, E[ X] = 3.5, 


which is consistent with our observation of the center of mass in Fig. 3.5(a). 


The use of the term “expected value” does not mean that we expect to observe 
E[X] when we perform the experiment that generates X. For example, the expected 
value of a Bernoulli trial is p, but its outcomes are always either 0 or 1. 

E|[X] corresponds to the “average of X” in a large number of observations of X. 
Suppose we perform n independent repetitions of the experiment that generates X, 
and we record the observed values as x(1), x(2),..., x(n), where x(j) is the observation 
in the jth experiment. Let N,(n) be the number of times x, is observed, and let 
f(n) = N,(n)/n be the corresponding relative frequency. The arithmetic average, or 
sample mean, of the observations, is: 


o x(1) + x(2) + + x(n) xyNi(n) + x2M(n) + > + XEN (0) + +> 


n n 
xfiln) + xofo(m) + + + xfa) + 


Direhe(n)- (3.10) 


106 Chapter 3 Discrete Random Variables 


The first numerator adds the observations in the order in which they occur, and the sec- 
ond numerator counts how many times each x; occurs and then computes the total. As n 
becomes large, we expect relative frequencies to approach the probabilities py(x,): 


lim f(n) = py(x,) forall k. (3.11) 
Equation (3.10) then implies that: 
(X)n = Dafn) > Di xp x(x) = E[X]. (3.12) 


Thus we expect the sample mean to converge to E[X] as n becomes large. 


Example 3.14 A Betting Game 


A player at a fair pays $1.50 to toss a coin three times. The player receives $1 if the number of 
heads is 2, $8 if the number is 3, but nothing otherwise. Find the expected value of the reward Y. 
What is the expected value of the gain? 

The expected reward is: 


E[Y] = Opy(0) + 1py(1)) + 8py(8) = ($) ' ($) ' a(2) = (2). 


The expected gain is: 


at 12_ 1 

8 8 8 
Players lose 12.5 cents on average per game, so the house makes a nice profit over the long run. 
In Example 3.18 we will see that some engineering designs also “bet” that users will behave a 


certain way. 


E[Y - 15] 


Example 3.15 Mean of a Geometric Random Variable 


Let X be the number of bytes in a message, and suppose that X has a geometric distribution with 
parameter p. Find the mean of X. 
X can take on arbitrarily large values since Sy = {1,2,...}. The expected value is: 


E[X] = kpa = pka. 


This expression is readily evaluated by differentiating the series 


E > (3.13) 
to obtain 
ae = ake. (3.14) 
Letting x = q, we obtain 
E[X] = p—— = = (3.15) 
(l-q@y pP 


We see that X has a finite expected value as long as p > 0. 


3.3.1 
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For certain random variables large values occur sufficiently frequently that the 
expected value does not exist, as illustrated by the following example. 


Example 3.16 St. Petersburg Paradox 


A fair coin is tossed repeatedly until a tail comes up. If X tosses are needed, then the casino 
pays the gambler Y = 2* dollars. How much should the gambler be willing to pay to play this 
game? 

If the gambler plays this game a large number of times, then the payoff should be the ex- 
pected value of Y = 2%. If the coin is fair, P[X = k] = (1/2)* and P[Y = 2*] = (1/2)*, so: 


o0 oo k 
E[Y]= 22" py(2") S2(5) =1+1+ =o, 


This game does indeed appear to offer the gambler a sweet deal, and so the gambler should be 
willing to pay any amount to play the game! The paradox is that a sane person would not pay a 
lot to play this game. Problem 3.34 discusses ways to resolve the paradox. 


Random variables with unbounded expected value are not uncommon and ap- 
pear in models where outcomes that have extremely large values are not that rare. Ex- 
amples include the sizes of files in Web transfers, frequencies of words in large bodies 
of text, and various financial and economic problems. 


Expected Value of Functions of a Random Variable 


Let X be a discrete random variable, and let Z = g(X). Since X is discrete, Z = g(X) 
will assume a countable set of values of the form g(x,) where x, e Sy. Denote the set 
of values assumed by g(X) by {z1, z2,... }. One way to find the expected value of Z is 
to use Eq. (3.8), which requires that we first find the pmf of Z. Another way is to use 
the following result: 


E[Z] = Elg(X)] = D8 (4) Px (4): (3.16) 


To show Eq. (3.16) group the terms x, that are mapped to each value z;: 
28%) Px(%) = Ea > px] z > zpz(z;) = E[Z]. 
J XK BUNK) Fj J 


The sum inside the braces is the probability of all terms x, for which g(x) = zj, which 
is the probability that Z = z;, that is, pz(z;). 


Example 3.17 Square-Law Device 


Let X be a noise voltage that is uniformly distributed in Sy = {—3, —1, +1, +3} with py(k) = 1/4 
for kin Sy. Find E[Z] where Z = X°. 
Using the first approach we find the pmf of Z: 
pz(9) = P[X e {—3, +3}] = px(—3) + px(3) = 1/2 
P2(1) = px(-1) + px(1) = 1/2 
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and so 


The second approach gives: 


E[Z] = E[X?] = Le pxk) 5 ra 3)? + (-1P +P +37} = a= 5, 


Equation 3.16 implies several very useful results. Let Z be the function 
Z = ag(X) + bh(X) +c 
where a, b, and c are real numbers, then: 
E[Z] = aE[g(X)] + bE[A(X)] + c. (3.17a) 
From Eq. (3.16) we have: 


E[Z] = Elag(X) + bh(X = 2 ag(x,) + bA(x,) + c)px(xx) 
E adel (xk)Px(xk) + DZM (xk)Ppx(xk) + c2 pxl (Xx) 
= aE| g(X)] + bE[A(X)] + c. 


Equation (3.17a), by setting a, b, and/or c to 0 or 1, implies the following expressions: 


Elg(X) + h(X)] = Elg(X)] + E[h(X)]. (3.17b) 
E[aX] = aE[X]. (3.17c) 
EIX +c] =E[X] +c. (3.17d) 
Efc] = c. (3.17e) 


Example 3.18 Square-Law Device 


The noise voltage X in the previous example is amplified and shifted to obtain Y = 2X + 10, 
and then squared to produce Z = Y* = (2X + 10)°. Find E[Z]. 


E[Z] = E[(2X + 10)?] = E[4x? + 40X + 100] 
= 4E[ X*] + 40E[X] + 100 = 4(5) + 40(0) + 100 = 120. 


Example 3.19 Voice Packet Multiplexer 


Let X be the number of voice packets containing active speech produced by n = 48 independent 
speakers in a 10-millisecond period as discussed in Section 1.4. X is a binomial random variable 
with parameter n and probability p = 1/3. Suppose a packet multiplexer transmits up to 
M = 20 active packets every 10 ms, and any excess active packets are discarded. Let Z be the 
number of packets discarded. Find E[Z]. 


3.3.2 
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The number of packets discarded every 10 ms is the following function of X: 


0 ifxX = M 
xX —M if X > M. 


riz) = $ e- 29(*)(2) (2) = orm. 


Every 10 ms E[ X] = np = 16 active packets are produced on average, so the fraction of active 
packets discarded is 0.182/16 = 1.1%, which users will tolerate. This example shows that engi- 
neered systems also play “betting” games where favorable statistics are exploited to use re- 
sources efficiently. In this example, the multiplexer transmits 20 packets per period instead of 48 
for a reduction of 28/48 = 58%. 


z=(x-mye| 


Variance of a Random Variable 


The expected value E[X], by itself, provides us with limited information about X. For ex- 
ample, if we know that E| X] = 0, then it could be that X is zero all the time. However, 
it is also possible that X can take on extremely large positive and negative values. We 
are therefore interested not only in the mean of a random variable, but also in the ex- 
tent of the random variable’s variation about its mean. Let the deviation of the random 
variable X about its mean be X — E| X], which can take on positive and negative val- 
ues. Since we are interested in the magnitude of the variations only, it is convenient to 
work with the square of the deviation, which is always positive, D(X) = (X — E[.X})’. 
The expected value is a constant, so we will denote it by my = E| X]. The variance of 
the random variable X is defined as the expected value of D: 


o% = VAR[X] = E[(X - mx)?] 
= > (= myx) px(x) T 2 (xx E myx} Px(Xx): (3.18) 


The standard deviation of the random variable X is defined by: 
oy = STD[X] = VAR[X]!”. (3.19) 


By taking the square root of the variance we obtain a quantity with the same units as X. 
An alternative expression for the variance can be obtained as follows: 


VAR[X] = E[(X - my}] = E[X? - 2myX + m3] 


= E[X?] - mx. (3.20) 


E[ X?] is called the second moment of X. The nth moment of X is defined as E[X”]. 
Equations (3.17c), (3.17d), and (3.17e) imply the following useful expressions for 
the variance. Let Y = X + c, then 


VAR[X + c] = E[(X + c — (E[X] + c)])"] 
= E[(X — E[X])?] = VAR[X]. (3.21) 


110 Chapter 3 Discrete Random Variables 


Adding a constant to a random variable does not affect the variance. Let Z = cX, 
then: 


VAR[cX] = E[(cX — cE[X])?] = Elc?(X — ELX])’] = c? VAR[X]. (3.22) 
Scaling a random variable by c scales the variance by c? and the standard deviation by |cl. 


Now let X = c, a random variable that is equal to a constant with probability 1, then 


VAR[X] = E[(X - c)*] = E[0] = 0. (3.23) 


A constant random variable has zero variance. 


Example 3.20 Three Coin Tosses 


Let X be the number of heads in three tosses of a fair coin. Find VAR[X]. 


E[ X?] = o(2) H r(2) »(2) (1) =3 and 


VAR[X] = E[X?] - my = 3 - 1.5? = 0.75. 


Recall that this is an n = 3, p = 1/2 binomial random variable. We see later that variance for the 
binomial random variable is npq. 


Example 3.21 Variance of Bernoulli Random Variable 


Find the variance of the Bernoulli random variable J,. 
E({I4] = Op;(0) + Pp,(1) = p andso 


VAR[L4] = p - pP = p(1 - p) = pq. (3.24) 


Example 3.22 Variance of Geometric Random Variable 


Find the variance of the geometric random variable. 
Differentiate the term (1 — x”)! in Eq. (3.14) to obtain 


: Sk(k — 1)xk-?, 


-x 6 


Let x = q and multiply both sides by pq to obtain: 


ea Pg È k(k -= gr 


3.4 


3.4.1 
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and the variance is 
1l+q 1 q 
p P P 


VAR[X] = E[X?] - E[X}? 


CONDITIONAL PROBABILITY MASS FUNCTION 


In many situations we have partial information about a random variable X or about 
the outcome of its underlying random experiment. We are interested in how this infor- 
mation changes the probability of events involving the random variable. The condi- 
tional probability mass function addresses this question for discrete random variables. 


Conditional Probability Mass Function 


Let X be a discrete random variable with pmf py(x), and let C be an event that has 
nonzero probability, P[C] > 0. See Fig. 3.7. The conditional probability mass function 
of X is defined by the conditional probability: 


Px(x|C) = P[X =x|C] forxa real number. (3.25) 
Applying the definition of conditional probability we have: 
P[{X =x} NC] 
P[C] 


px(x|C) = (3.26) 
The above expression has a nice intuitive interpretation: The conditional probability of the 
event {X = xx} is given by the probabilities of outcomes ¢ for which both X (¢) = xand 
é are in C, normalized by P[C]. 

The conditional pmf satisfies Eqs. (3.4a) — (3.4c). Consider Eq. (3.4b). The set of 
events A, = {X = x;,} is a partition of S, so 


C= (J(A, NC), and 
k 


P|{X = x} NC] 


> px(x,|C) = > px(xxl1C) = = 


XkES y all k all k P[C] 
E AIG)... 
= PC) ay enc] = PIC] 1. 


FIGURE 3.7 
Conditional pmf of X given event C. 
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Similarly we can show that: 


P[X in BIC] = >) px(xlC) where B C Sx. 
xeB 


Example 3.23 A Random Clock 


The minute hand in a clock is spun and the outcome ¢ is the minute where the hand comes to 
rest. Let X be the hour where the hand comes to rest. Find the pmf of X. Find the conditional 
pmf of X given B = {first 4 hours}; given D = {1 < ¢ = 11}. 

We assume that the hand is equally likely to rest at any of the minutes in the range 
S = {1,2,...,60}, so P[¢ = k] = 1/60 for k in S. X takes on values from Sy = {1,2,..., 12} 
and it is easy to show that py(j) = 1/12 for jin Sy. Since B = {1,2,3, 4}: 


P|{X =j}OB] PiXe {7} {1, 2, 3, 4}] 


|B) = = 
px(ilB) P[B] P(X e {1, 2, 3, 4}] 
PX =j] 1 
=— ifje{1,2,3,4 
B 5 z É {1, 2, 3, 4} 
0 otherwise. 


The event B above involves X only. The event D, however, is stated in terms of the out- 
comes in the underlying experiment (i.e., minutes not hours), so the probability of the intersec- 
tion has to be expressed accordingly: 


P[{X =j}OD] Plé: X(o) = jand Ze {2,..., 11}] 


px(jlD) = P[D] F Plée{2,...,11}] 
P[¢ e {2,3,4,5}] 4 l 
10/60 ~ 10 tord 
Piće {6,7,8,9,10}] 5 l 
10/60 Sigo Ooa 
TES w = a for j = 3. 
10/60 10 


Most of the time the event C is defined in terms of X, for example C = {X > 10} 
orC = {a = X = b}. For x, in Sy, we have the following general result: 


Px(Xx) 
Px(xlC) = 4 PLC] 
0 if xg C. 


The above expression is determined entirely by the pmf of X. 


if 
U xE C 8.27) 


Example 3.24 Residual Waiting Times 


Let X be the time required to transmit a message, where X is a uniform random variable with 
Sx = {1,2,..., L}. Suppose that a message has already been transmitting for m time units, find 
the probability that the remaining transmission time is j time units. 


3.4.2 
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We are given C = {X > m},soform+1=s=m+j5 L: 
P| X=m+ j] 
P[X > m] 


px(m + j|X > m) 


= = form+1sm+j<L. (3.28) 


X is equally likely to be any of the remaining L — m possible values. As m increases, 1/(L — m) 
increases implying that the end of the message transmission becomes increasingly likely. 


Many random experiments have natural ways of partitioning the sample space S 
into the union of disjoint events B4, B),..., B,,. Let py(x|B;) be the conditional pmf of 
X given event B;. The theorem on total probability allows us to find the pmf of X in 
terms of the conditional pmf’s: 


Px(x) = > Px(x1Bi) PLB]. (3.29) 
Example 3.25 Device Lifetimes 


A production line yields two types of devices. Type 1 devices occur with probability a and work 
for a relatively short time that is geometrically distributed with parameter r. Type 2 devices work 
much longer, occur with probability 1 — a, and have a lifetime that is geometrically distributed 
with parameter s. Let X be the lifetime of an arbitrary device. Find the pmf of X. 

The random experiment that generates X involves selecting a device type and then ob- 
serving its lifetime. We can partition the sets of outcomes in this experiment into event B4, con- 
sisting of those outcomes in which the device is type 1, and B}, consisting of those outcomes in 
which the device is type 2. The conditional pmf’s of X given the device type are: 


Pxip(k) = (1 - r)k!r fork = 1,2,... 
and 
Pxip,(k) = (1 — s)*'s fork = 1,2,.... 
We obtain the pmf of X from Eq. (3.29): 
px(k) = px(k|Bi)P[Bi] + px(k| By) P[ By] 


=(1-—r) tra + (1—s)*'s(11 —- a) fork = 1,2,.... 


Conditional Expected Value 


Let X be a discrete random variable, and suppose that we know that event B has oc- 
curred. The conditional expected value of X given B is defined as: 


myg = E[X|B] = DA xpx(x|B) z > xkpx(x:lB) (3.30) 


xeSy 
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where we apply the absolute convergence requirement on the summation. The conditional 
variance of X given B is defined as: 


VAR[X|B] = E[(X — myjg)?|B] = 2 (xx — myig) Px( xxl B) 


= E[X?|B] — map. 


Note that the variation is measured with respect to my gz, not my. 
Let B,, By,..., B, be the partition of S, and let py(x|B;) be the conditional pmf of X 
given event B;. ELX] can be calculated from the conditional expected values E[_X |B]: 


E[X] = SȘ' E[X|B;]P[B;]. (3.31a) 
i=l 


By the theorem on total probability we have: 


FIX] = Dkpx(xi) = SAY Soatosiayere| 


= S| Zevsa) beta, = > ELXIBJPLB] 


{= 


where we first express px(x,) in terms of the conditional pmf’s, and we then change 
the order of summation. Using the same approach we can also show 


Elg(X)] = > Elg(X)1B]PLB:] (3.31b) 


Example 3.26 Device Lifetimes 


Find the mean and variance for the devices in Example 3.25. 
The conditional mean and second moment of each device type is that of a geometric ran- 
dom variable with the corresponding parameter: 


my, = Wr EL X*|B,] = (1 + rir’ 
mys, = 1/s EL X?|By] = (1 + 5)/s?. 


The mean and the second moment of X are then: 


my = mM x|B,o& T myjgB,(1 a) = alr t (1 a)/s 


E( X?] = E[X?|B,Ja + E[X?|B.](1 — a) = a(1 + rir? + (1 — a)(1 + s)/s?. 


Finally, the variance of X is: 


VAR[X] = E[X?] - my = C Spur a ay (: i 


Note that we do not use the conditional variances to find VAR[Y] because Eq. 
(3.31b) does not apply to conditional variances. (See Problem 3.40.) However, the 
equation does apply to the conditional second moments. 


3.5 
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IMPORTANT DISCRETE RANDOM VARIABLES 


Certain random variables arise in many diverse, unrelated applications. The pervasive- 
ness of these random variables is due to the fact that they model fundamental mecha- 
nisms that underlie random behavior. In this section we present the most important of 
the discrete random variables and discuss how they arise and how they are interrelat- 
ed. Table 3.1 summarizes the basic properties of the discrete random variables dis- 
cussed in this section. By the end of this chapter, most of these properties presented in 
the table will have been introduced. 


TABLE 3.1 Discrete random variables 


Bernoulli Random Variable 


Sx = {0,1} 
P=q=1-p p=p OSpsl 
E[X] =p VAR[X]= p(1- p)  Gx(z)= (4 + pz) 


Remarks: The Bernoulli random variable is the value of the indicator function I, for some event A; X = 1 
if A occurs and 0 otherwise. 


Binomial Random Variable 


Sy = {0,1,... n} 


n a 
pee ("ora = py Be 8 SO Ulead 


E[X] =np VAR[X] = np(1 - p) — Gx(z) = (4 + pz)” 


Remarks: X is the number of successes in n Bernoulli trials and hence the sum of n independent, identically 
distributed Bernoulli random variables. 


Geometric Random Variable 


First Version: Sy = {0,1,2,...} 


Pk = p(l — p)* k = 0,1,... 


1—p 1—p P 
=——  VAR[X] = a Gx(z) = ee 


E[X] 


Remarks: X is the number of failures before the first success in a sequence of independent Bernoulli trials. 
The geometric random variable is the only discrete random variable with the memoryless property. 


Second Version: Sy, = {1,2,...} 


Pe = p — p)! k =1,2,... 


1—p pz 
=—  VAR[X']= Gy(z) = 
Fr [X"] 7 x'(z) Egk 


E[X'] 


Remarks: X' = X + 1 is the number of trials until the first success in a sequence of independent Bernoulli 
trials. 


(Continued) 


116 


Chapter 3 Discrete Random Variables 


TABLE 3.1 Continued 


Negative Binomial Random Variable 


Sy = {r,r + 1,...} where r is a positive integer 


k-1 
n= ( tera = a | ee oe F irn 
ne 


r r(1 — p) ( pz y 
E[X]=—  VAR[X] = —,~— Gy(z) = 
[x] => Boe Ee a 
Remarks: X is the number of trials until the rth success in a sequence of independent Bernoulli trials. 


Poisson Random Variable 


Sy = {0,1,2,...} 
Pk =e * k=0,1,.... anda>0 


E[X] =a VAR[X] =a Gy(z) = ete} 


Remarks: X is the number of events that occur in one time unit when the time between events is exponen- 
tially distributed with mean 1/a. 


Uniform Random Variable 


Sy = {1,2,..., L} 
1 

== k =1,2,...,L 

Pk L TARERE 


L+1 L-1 z1i-2z 
VAR[X] = == 
[X] B x(z) Liss 


Remarks: The uniform random variable occurs whenever outcomes are equally likely. It plays a key role in 
the generation of random numbers. 


Zipf Random Variable 


Sy = {1,2,..., L} where L is a positive integer 


11 
Pk = Ck k = 1,2,..., L where cz is given by Eq. (3.45) 
L L(L+1 L 
E[X]=— VAR[X]= AA = 
cL 2c, ci 


Remarks: The Zipf random variable has the property that a few outcomes occur frequently but most out- 
comes occur rarely. 


Discrete random variables arise mostly in applications where counting is in- 
volved. We begin with the Bernoulli random variable as a model for a single coin toss. 
By counting the outcomes of multiple coin tosses we obtain the binomial, geometric, 
and Poisson random variables. 


3.5.1 


3.5.2 
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The Bernoulli Random Variable 


Let A be an event related to the outcomes of some random experiment. The Bernoulli 
random variable 74 (defined in Example 3.8) equals one if the event A occurs, and zero 
otherwise. J4 is a discrete random variable since it assigns a number to each outcome 
of S. It is a discrete random variable with range = {0, 1}, and its pmf is 


p(0)=1-p and p,(1) = p, (3.32) 


where P[ A] = p. 
In Example 3.11 we found the mean of I4: 


m; = ElI,] = p. 


The sample mean in n independent Bernoulli trials is simply the relative frequency of 
successes and converges to p as n increases: 


= ONo(n) + 1N (n) 


(Ta)n 


In Example 3.21 we found the variance of I4: 


o7 = VAR[I4] = p(1 — p) = pq. 


The variance is quadratic in p, with value zero at p = 0 and p = 1 and maximum at 
p = 1/2. This agrees with intuition since values of p close to 0 or to 1 imply a prepon- 
derance of successes or failures and hence less variability in the observed values. The 
maximum variability occurs when p = 1/2 which corresponds to the case that is most 
difficult to predict. 

Every Bernoulli trial, regardless of the event A, is equivalent to the tossing of a 
biased coin with probability of heads p. In this sense, coin tossing can be viewed as rep- 
resentative of a fundamental mechanism for generating randomness, and the Bernoul- 
li random variable is the model associated with it. 


= fil) ep. 


The Binomial Random Variable 


Suppose that a random experiment is repeated n independent times. Let X be the num- 
ber of times a certain event A occurs in these n trials. X is then a random variable with 
range Sy = {0,1,...,}. For example, X could be the number of heads in n tosses of 
a coin. If we let J; be the indicator function for the event A in the jth trial, then 


X=1],+ht+... +h, 


that is, X is the sum of the Bernoulli random variables associated with each of the n in- 
dependent trials. 
In Section 2.6, we found that X has probabilities that depend on n and p: 


n 


"pha =p fork =0,...,n. (3.33) 


P[X = k] = px(k) = ( 


X is called the binomial random variable. Figure 3.8 shows the pdf of X for n = 24 and 
p = .2and p = .5. Note that P| X = k] is maximum at kmax = [(n + 1)p], where [x] 
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a 2 
n= 24 n=24 
p=.2 p=5 

AS |— 15 

1} ae 

| | | | | | | 

Ll Pererin oHa fi EN 

0123 45 67 8 9 1011 12131415 16 17 18 19 20 21 22 23 24 01234 8 9 10111213 1415 16 17 18 19 20 21 22 23 24 
(a) 
FIGURE 3.8 


Probability mass functions of binomial random variable (a) p = 0.2; (b) p = 0.5. 


denotes the largest integer that is smaller than or equal to x. When (n + 1)p is an inte- 
ger, then the maximum is achieved at kmax and kmax — 1. (See Problem 3.50.) 
The factorial terms grow large very quickly and cause overflow problems in the 


calculation of A . We can use Eq. (2.40) for the ratio of successive terms in the 


pmf allows us to calculate py(k + 1) in terms of py(k) and delays the onset of 
overflows: 
Dx(k) k+11-p 


where px(0) = (1 — p)”. (3.34) 


The binomial random variable arises in applications where there are two types of 
objects (i.e., heads/tails, correct/erroneous bits, good/defective items, active/silent speak- 
ers), and we are interested in the number of type 1 objects in a randomly selected batch 
of size n, where the type of each object is independent of the types of the other objects in 
the batch. Examples involving the binomial random variable were given in Section 2.6. 


Example 3.27 Mean of a Binomial Random Variable 


The expected value of X is: 


n n n n n! E 
E[X] = Dy kpx(k) = Se()ora pyr Zea © ‘=p 
nm  (n-ho _ 
"R= Dn mi PI 
n-1 (n — 1)! . . 
= np. pP’ = p)" = np, (3.35) 


where the first line uses the fact that the k = 0 term in the sum is zero, the second line cancels out 
the k and factors np outside the summation, and the last line uses the fact that the summation is 
equal to one since it adds all the terms in a binomial pmf with parameters n — 1 and p. 
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The expected value E[ X] = np agrees with our intuition since we expect a fraction p of 
the outcomes to result in success. 


Example 3.28 Variance of a Binomial Random Variable 


To find E[X?] below, we remove the k = 0 term and then let k' = k — 1: 


E= n Te pie oe = Lege ne ape oe 


ll 
3 
v 
—“— 
= 
Mi 
= 
eS 
3 
a | 
m 
Se 
v 
BS 
a 
aN 
| 
v 
© 
= 
AA 
| 
> 
+ 
Mi 
Hs 
m 
r a 
3 
x | 
pà 
Se 
v 
wk 
m 
| 
v 
© 
x 
ne 
| 
= 
=< 


= np{(n — 1)p + 1} = np(np + q). 


In the third line we see that the first sum is the mean of a binomial random variable with para- 
meters (n — 1) and p, and hence equal to (n — 1)p. The second sum is the sum of the binomial 
probabilities and hence equal to 1. 

We obtain the variance as follows: 


ox = E[X?] — E[X) = np(np + q) — (np)? = npq = np(1 - p). 


We see that the variance of the binomial is n times the variance of a Bernoulli random variable. 
We observe that values of p close to 0 or to 1 imply smaller variance, and that the maximum vari- 
ability is when p = 1/2. 


Example 3.29 Redundant Systems 


A system uses triple redundancy for reliability: Three microprocessors are installed and the sys- 
tem is designed so that it operates as long as one microprocessor is still functional. Suppose that 
the probability that a microprocessor is still active after t seconds is p = e™™. Find the probabil- 
ity that the system is still operating after t seconds. 

Let X be the number of microprocessors that are functional at time t. X is a binomial ran- 
dom variable with parameter n = 3 and p. Therefore: 


PX =1)=1-P[xX =0j)=1-(1-e%y. 


The Geometric Random Variable 


The geometric random variable arises when we count the number M of independent 
Bernoulli trials until the first occurrence of a success. M is called the geometric random 
variable and it takes on values from the set {1, 2,... }. In Section 2.6, we found that the 
pmf of M is given by 


P[M = k] = pulk) = (1 — p)® pk = 1,2,..., (3.36) 


where p = P| A] is the probability of “success” in each Bernoulli trial. Figure 3.5(b) 
shows the geometric pmf for p = 1/2. Note that P[ M = k] decays geometrically with k, 
and that the ratio of consecutive terms is py(k+1)/py(k) = (1-p) = q. As p increas- 
es, the pmf decays more rapidly. 
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The probability that M =< k can be written in closed form: 
k 


ko k-i j= 
PIM =k) = Dp epa pa ea (3.37) 
j=l j'=0 q 


Sometimes we are interested in M’ = M — 1, the number of failures before a success 
occurs. We also refer to M’ as a geometric random variable. Its pmf is: 


P[M’ =k) =P[M=k+1]=(1- p)*p k=0,1,2,.... (3.38) 
In Examples 3.15 and 3.22, we found the mean and variance of the geometric ran- 
dom variable: 
l-p 


p 


my = E[M] = 1/p VAR[M] = 


We see that the mean and variance increase as p, the success probability, decreases. 
The geometric random variable is the only discrete random variable that satisfies 
the memoryless property: 


P[M = k + j|M > j] = P[M = k] forallj,k >1. 


(See Problems 3.54 and 3.55.) The above expression states that if a success has not oc- 
curred in the first j trials, then the probability of having to perform at least k more tri- 
als is the same as the probability of initially having to perform at least k trials. Thus, 
each time a failure occurs, the system “forgets” and begins anew as if it were perform- 
ing the first trial. 

The geometric random variable arises in applications where one is interested in 
the time (i.e., number of trials) that elapses between the occurrence of events in a se- 
quence of independent experiments, as in Examples 2.11 and 2.43. Examples where the 
modified geometric random variable M’ arises are: number of customers awaiting ser- 
vice in a queueing system; number of white dots between successive black dots in a 
scan of a black-and-white document. 


The Poisson Random Variable 


In many applications, we are interested in counting the number of occurrences of an 
event in a certain time period or in a certain region in space. The Poisson random vari- 
able arises in situations where the events occur “completely at random” in time or 
space. For example, the Poisson random variable arises in counts of emissions from ra- 
dioactive substances, in counts of demands for telephone connections, and in counts of 
defects in a semiconductor chip. 

The pmf for the Poisson random variable is given by 


PIN = k] = p(k) =—e* fork =0,1,2,..., (3.39) 


where a is the average number of event occurrences in a specified time interval or region 
in space. Figure 3.9 shows the Poisson pmf for several values of a. Fora < 1, P[N = k] 
is maximum at k = 0; fora > 1, P[N = k] is maximum at [a]; if a is a positive integer, 
the P[N = k]is maximum atk = aandatk =a — 1. 
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FIGURE 3.9 
Probability mass functions of Poisson random variable (a) a = 0.75; 
(b) a = 3; (c) a = 9. 
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The pmf of the Poisson random variable sums to one, since 


(oe) a¥ foe) ak 
S = = Ae 
ye =e ` =e “e = ], 


where we used the fact that the second summation is the infinite series expansion for e“. 
It is easy to show that the mean and variance of a Poisson random variable is 
given by: 


E[N]=a and oy = VAR[N] =a. 


Example 3.30 Queries at a Call Center 


The number N of queries arriving in ¢ seconds at a call center is a Poisson random variable with 
a = At where À is the average arrival rate in queries/second. Assume that the arrival rate is four 
queries per minute. Find the probability of the following events: (a) more than 4 queries in 10 
seconds; (b) fewer than 5 queries in 2 minutes. 

The arrival rate in queries/second is A = 4 queries/60 sec = 1/15 queries/sec. In part a, the 
time interval is 10 seconds, so we have a Poisson random variable with a = (1/15 queries/sec) * 
10 seconds = 10/15 queries. The probability of interest is evaluated numerically: 


4 (2/3)* 


=o k! 


PIN>4])=1-P[N<=4]=1 e? = 6,33(10%). 
In part b, the time interval of interest is £ = 120 seconds, so a = 1/15*120 seconds = 8. The 
probability of interest is: 

5 (8)* 


FIN = 5] =e = 0.10. 
& k! 


Example 3.31 Arrivals at a Packet Multiplexer 


The number N of packet arrivals in t seconds at a multiplexer is a Poisson random variable with 
a = At where A is the average arrival rate in packets/second. Find the probability that there are 
no packet arrivals in ¢ seconds. 


—At —At 


P[N = 0] = TE 
This equation has an interesting interpretation. Let Z be the time until the first packet ar- 
rival. Suppose we ask, “What is the probability that X > t, that is, the next arrival occurs t or 
more seconds later?” Note that {N = 0} implies {Z > t} and vice versa, so P[Z > t] = e™. 
The probability of no arrival decreases exponentially with t. 
Note that we can also show that 


PIN(t) = n] =1- P[{N(t)<n]=1- 5 kl 
k=0 K- 


One of the applications of the Poisson probabilities in Eq. (3.39) is to approxi- 
mate the binomial probabilities in the case where p is very small and n is very large, 
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that is, where the event A of interest is very rare but the number of Bernoulli trials is 
very large. We show that if a = np is fixed, then as n becomes large: 


n ak 
MA p- py? k= ye” (fork =0,1,.... (3.40) 
Equation (3.40) is obtained by taking the limit n — œ in the expression for p,, while 
keeping a = np fixed. First, consider the probability that no events occur in n trials: 


Po = (1 - p)" = (1 
where the limit in the last expression is a well known result from calculus. Consider the 
ratio of successive binomial probabilities: 

P1 (4 — kk) p (1 — k/n)a 


Pk (k+1)\q4 (k+1)(1 - an) 


a 


n 
) >e |“ asn—> o, (3.41) 
n 


> S as > OO 
k+1 F l 
Thus the limiting probabilities satisfy 
k 
Q Q Q Q a 
= = oie = ae 3.42 
PRET eee (HG) ($r k!“ (at 


Thus the Poisson pmf can be used to approximate the binomial pmf for large n and 
small p, using a = np. 


Example 3.32 Errors in Optical Transmission 


An optical communication system transmits information at a rate of 10° bits/second. The proba- 
bility of a bit error in the optical communication system is 10°. Find the probability of five or 
more errors in 1 second. 

Each bit transmission corresponds to a Bernoulli trial with a “success” corresponding to a 
bit error in transmission. The probability of k errors in n = 10° transmissions (1 second) is then 
given by the binomial probability with n = 10° and p = 10°°. The Poisson approximation uses 
a = np = 10°(10°) = 1. Thus 


The Poisson random variable appears in numerous physical situations because 
many models are very large in scale and involve very rare events. For example, the 
Poisson pmf gives an accurate prediction for the relative frequencies of the number of 
particles emitted by a radioactive mass during a fixed time period. This correspon- 
dence can be explained as follows. A radioactive mass is composed of a large number 
of atoms, say n. In a fixed time interval each atom has a very small probability p of dis- 
integrating and emitting a radioactive particle. If atoms disintegrate independently of 
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0 T. 


FIGURE 3.10 
Event occurrences in n subintervals of [0, 7]. 


other atoms, then the number of emissions in a time interval can be viewed as the num- 
ber of successes in n trials. For example, one microgram of radium contains about 
n = 10!° atoms, and the probability that a single atom will disintegrate during a one- 
millisecond time interval is p = 10 [Rozanov, p. 58]. Thus it is an understatement to 
say that the conditions for the approximation in Eq. (3.40) hold: n is so large and p so 
small that one could argue that the limit n — oo has been carried out and that the num- 
ber of emissions is exactly a Poisson random variable. 

The Poisson random variable also comes up in situations where we can imagine a 
sequence of Bernoulli trials taking place in time or space. Suppose we count the num- 
ber of event occurrences in a T-second interval. Divide the time interval into a very 
large number, n, of subintervals as shown in Fig. 3.10. A pulse in a subinterval indicates 
the occurrence of an event. Each subinterval can be viewed as one in a sequence of in- 
dependent Bernoulli trials if the following conditions hold: (1) At most one event can 
occur in a subinterval, that is, the probability of more than one event occurrence is neg- 
ligible; (2) the outcomes in different subintervals are independent; and (3) the proba- 
bility of an event occurrence in a subinterval is p = a/n, where «œ is the average 
number of events observed in a 1-second interval. The number N of events in 1 second 
is a binomial random variable with parameters n and p = a/n. Thus as n > œ, N be- 
comes a Poisson random variable with parameter a. In Chapter 9 we will revisit this re- 
sult when we discuss the Poisson random process. 


The Uniform Random Variable 


The discrete uniform random variable Y takes on values in a set of consecutive inte- 
gers Sy = {j + 1,...,j7 + L} with equal probability: 


py(k) ==> for ke{j+1,...,j +L}. (3.43) 


This humble random variable occurs whenever outcomes are equally likely, e.g., toss of 
a fair coin or a fair die, spinning of an arrow in a wheel divided into equal segments, se- 
lection of numbers from an urn. It is easy to show that the mean and variance are: 


Ľ -1 
12 `’ 


L+1 
E[Y]=j+ and VAR[Y] = 


Example 3.33 Discrete Uniform Random Variable in Unit Interval 


Let X be a uniform random variable in Sy = {0,1,..., L — 1}. We define the discrete uniform 
random variable in the unit interval by 


12 3 1 L 
IREL” L # 


U = — so Sy = {0 


3.5.6 
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U has pmf: 


k 1 
—]=—> fork = 0,2,..., L- 1. 
ref £) L 
The pmf of U puts equal probability mass 1/L on equally spaced points x, = k/L in the unit in- 
terval. The probability of a subinterval of the unit interval is equal to the number of points in the 
subinterval multiplied by 1/L. As L becomes very large, this probability is essentially the length 
of the subinterval. 


The Zipf Random Variable 


The Zipf random variable is named for George Zipf who observed that the frequen- 
cy of words in a large body of text is proportional to their rank. Suppose that words 
are ranked from most frequent, to next most frequent, and so on. Let X be the rank 
of a word, then Sy = {1,2,..., L} where L is the number of distinct words. The pmf 
of X is: 
11 
Px(k) =—— fork = 1,2,...,L. (3.44) 
CL k 
where cz is a normalization constant. The second word has 1/2 the frequency of occur- 
rence as the first, the third word has 1/3 the frequency of the first, and so on. The nor- 
malization constant cz is given by the sum: 
L 
1 de il 1 
= Sb Ar a Se 3.45 
CL > j 7° 3 L ( ) 


The constant cz; occurs frequently in calculus and is called the Lth harmonic 
mean and increases approximately as InL. For example, for L = 100, cp = 5.187378 
and cz — In(L) = 0.582207. It can be shown that as L —> œ, c; — lnL ~0.57721.... 


The mean of X is given by: 
Ba ; L1 L 
E[X] = Dipxr(i) = Dia =. (3.46) 
j=1 j=1 CL] CL 
The second moment and variance of X are: 
L 1 tz L(L + 1) 
E[X*] = DP = —Ddi= 
[x] > Cr) amt 2c, 
and 
L(L+1) r? 
VAR[X] = = (3.47) 
2c, CL 


The Zipf and related random variables have gained prominence with the 
growth of the Internet where they have been found in a variety of measurement 
studies involving Web page sizes, Web access behavior, and Web page interconnectiv- 
ity. These random variables had previously been found extensively in studies on the 
distribution of wealth and, not surprisingly, are now found in Internet video rentals 
and book sales. 
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FIGURE 3.11 
Zipf distribution and its long tail. 
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FIGURE 3.12 
Lorenz curve for Zipf random variable with L = 100. 


Example 3.34 Rare Events and Long Tails 


The Zipf random variable X has the property that a few outcomes (words) occur frequently but 
most outcomes occur rarely. Find the probability of words with rank higher than m. 
1 1 -1 Cin 
CL j= J CL 
We call P[ X > m] the probability of the tail of the distribution of X. Figure 3.11 shows 
the P[X > m] with L = 100 which has ELX] = 100/cio = 19.28. Figure 3.12 also shows 
P[Y > m] for a geometric random variable with the same mean, that is, 1/p = 19.28. It can be 
seen that P[Y > m] for the geometric random variable drops off much more quickly than 
P| X > m]. The Zipf distribution is said to have a “long tail” because rare events are more like- 
ly to occur than in traditional probability models. 


P[X >m])=1-P[X=m]=1 form = L. (3.48) 


Example 3.35 80/20 Rule and the Lorenz Curve 


Let X correspond to a level of wealth and py(k) be the proportion of a population that has 
wealth k. Suppose that X is a Zipf random variable. Thus py(1) is the proportion of the popula- 
tion with wealth 1, px(2) the proportion with wealth 2, and so on. The long tail of the Zipf dis- 
tribution suggests that very rich individuals are not very rare. We frequently hear statements 
such as “20% of the population owns 80% of the wealth.” The Lorenz curve plots the proportion 
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of wealth owned by the poorest fraction x of the population, as the x varies from 0 to 1. Find the 
Lorenz curve for L = 100. 
For kin {1,2,..., L}, the fraction of the population with wealth k or less is: 


(ee 
HPS kK ea yee (3.49) 
Cr j=1 J CL 
The proportion of wealth owned by the population that has wealth k or less is: 
k k 
SANE Aan 
Xjipex D) — Die 
j=l CLi=1 J k 
Wee = os F =i (3.50) 
aig 14.1 L 
X ipx(i) SDi 
i=1 CLi=1 l 


The denominator in the above expression is the total wealth of the entire population. The Lorenz 
curve consists of the plot of points ( F, Wp) which is shown in Fig. 3.12 for L = 100. In the graph the 
70% poorest proportion of the population own only 20% of the total wealth, or conversely, the 30% 
wealthiest fraction of the population owns 80% of the wealth. See Problem 3.75 for a discussion of 
what the Lorenz curve should look like in the cases of extreme fairness and extreme unfairness. 


The explosive growth in the Internet has led to systems of huge scale. For proba- 
bility models this growth has implied random variables that can attain very large val- 
ues. Measurement studies have revealed many instances of random variables with long 
tail distributions. 

If we try to let L approach infinity in Eq. (3.45), c; grows without bound since the 
series does not converge. However, if we make the pmf proportional to (1/k)* then the 
series converges as long as a > 1. We define the Zipf or zeta random variable with 
range {1,2,3,...} to have pmf: 


k) =—— fork = 1,2,... 3.51 
Pz( ) Za ke or res > ( ) 
where z, is a normalization constant given by the zeta function which is defined by: 
t 1 1 

ha aie 1 T za T 3a a E a! fora > 1. (3.52) 

The convergence of the above series is discussed in standard calculus books. 

The mean of Z is given by: 
L L L 
3 š A 1 1 1 Za-1 
E[Z] = Diez) > di oo X” ay fora > 2, 
j=l j=l ZaJ Zaj=1] Za 


where the sum of the sequence 1/j*~' converges only if a — 1 > 1, that is, a > 2. We 
can similarly show that the second moment (and hence the variance) exists only ifa > 3. 


GENERATION OF DISCRETE RANDOM VARIABLES 


Suppose we wish to generate the outcomes of a random experiment that has sam- 
ple space S = {a,,a),...,a,} with probability of elementary events p; = P[{a;}]. 
We divide the unit interval into n subintervals. The jth subinterval has length p; and 
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FIGURE 3.13 
Generating a binomial random variable with n = 5, p = 1/2. 


corresponds to outcome aj. Each trial of the experiment first uses rand to obtain a 
number U in the unit interval. The outcome of the experiment is a; if U is in the jth 
subinterval. Figure 3.13 shows the portioning of the unit interval according to the 
pmf of ann = 5, p = 0.5 binomial random variable. 

The Octave function discrete_xrnd implements the above method and can be 
used to generate random numbers with desired probabilities. Functions to generate 
random numbers with common distributions are also available. For example, 
poisson_rnd (lambda, r, c) can be used to generate an array of Poisson-distributed 
random numbers with rate lambda. 


Example 3.36 Generation of Tosses of a Die 


Use discrete_rnd to generate 20 samples of a toss of a die. 


>V=1:6; % Define Sy = {1, 2,3, 4,5, 6}. 

>P=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]; % Set all the pmf values for X to 1/6. 

> discrete_rnd (20, V, P) % Generate 20 samples from Sy with pmf P. 
ans = 


622 65 2613 63163 42 5 3 4 «21 


Example 3.37 Generation of Poisson Random Variable 
Use the built-in function to generate 20 samples of a Poisson random variable with a = 2. 


> Poisson_rnd (2,1,20) % Generate a 1 X 20 array of samples of a Poisson 
% random variable with a = 2. 


43 023 212140312 23 4 01 3 
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The problems at the end of the chapter elaborate on the rich set of experiments 
that can be simulated using these basic capabilities of MATLAB or Octave. In the re- 
mainder of this book, we will use Octave in examples because it is freely available. 


SUMMARY 


e A random variable is a function that assigns a real number to each outcome of a 
random experiment. A random variable is defined if the outcome of a random ex- 
periment is a number, or if a numerical attribute of an outcome is of interest. 

e The notion of an equivalent event enables us to derive the probabilities of events 
involving a random variable in terms of the probabilities of events involving the 
underlying outcomes. 

e A random variable is discrete if it assumes values from some countable set. The 
probability mass function is sufficient to calculate the probability of all events 
involving a discrete random variable. 

e The probability of events involving discrete random variable X can be expressed 
as the sum of the probability mass function px(x). 

e If Xis a random variable, then Y = g(X) is also a random variable. 

e The mean, variance, and moments of a discrete random variable summarize some 
of the information about the random variable X.These parameters are useful in 
practice because they are easier to measure and estimate than the pmf. 

e The conditional pmf allows us to calculate the probability of events given partial 
information about the random variable X. 

e There are a number of methods for generating discrete random variables with 
prescribed pmf’s in terms of a random variable that is uniformly distributed in 
the unit interval. 


CHECKLIST OF IMPORTANT TERMS 


Discrete random variable Probability mass function 
Equivalent event Random variable 
Expected value of X Standard deviation of X 
Function of a random variable Variance of X 


nth moment of X 


ANNOTATED REFERENCES 


Reference [1] is the standard reference for electrical engineers for the material on ran- 
dom variables. Reference [2] discusses some of the finer points regarding the concepts 
of a random variable at a level accessible to students of this course. Reference [3] is a 
classic text, rich in detailed examples. Reference [4] presents detailed discussions of the 
various methods for generating random numbers with specified distributions. Refer- 
ence [5] is entirely focused on discrete random variables. 


1. A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic 
Processes, 4th ed., McGraw-Hill, New York, 2002. 

2. K.L. Chung, Elementary Probability Theory, Springer-Verlag, New York, 1974. 

3. W. Feller, An Introduction to Probability Theory and Its Applications, Wiley, New 
York, 1968. 
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Section 3.1: The Notion of a Random Variable 


3.1. 


3.2. 


3.3. 


3.4. 


3.5. 


Let X be the maximum of the number of heads obtained when Carlos and Michael each 
flip a fair coin twice. 


(a) Describe the underlying space S of this random experiment and specify the proba- 
bilities of its elementary events. 


(b) Show the mapping from S to Sy, the range of X. 
(c) Find the probabilities for the various values of X. 


A die is tossed and the random variable X is defined as the number of full pairs of dots in 
the face showing up. 


(a) Describe the underlying space S of this random experiment and specify the proba- 
bilities of its elementary events. 

(b) Show the mapping from S to Sy, the range of X. 

(c) Find the probabilities for the various values of X. 

(d) Repeat parts a, b, and c, if Y is the number of full or partial pairs of dots in the face 
showing up. 

(e) Explain why P[X = 0] and P[Y = 0] are not equal. 

The loose minute hand of a clock is spun hard. The coordinates (x, y) of the point where 

the tip of the hand comes to rest is noted. Z is defined as the sgn function of the product 

of x and y, where sgn(f) is Lift > 0,0 ift = 0, and —1 ift < 0. 

(a) Describe the underlying space S of this random experiment and specify the proba- 
bilities of its events. 

(b) Show the mapping from S to Sy, the range of X. 

(c) Find the probabilities for the various values of X. 

A data source generates hexadecimal characters. Let X be the integer value correspond- 

ing to a hex character. Suppose that the four binary digits in the character are indepen- 

dent and each is equally likely to be 0 or 1. 

(a) Describe the underlying space S of this random experiment and specify the proba- 
bilities of its elementary events. 


(b) Show the mapping from S to Sy, the range of X. 

(c) Find the probabilities for the various values of X. 

(d) Let Y be the integer value of a hex character but suppose that the most significant bit 
is three times as likely to be a “0” as a “1”. Find the probabilities for the values of Y. 

Two transmitters send messages through bursts of radio signals to an antenna. During 

each time slot each transmitter sends a message with probability 1/2. Simultaneous trans- 

missions result in loss of the messages. Let X be the number of time slots until the first 

message gets through. 


3.6. 


3.7. 


3.8. 


3.9. 


3.10. 
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(a) Describe the underlying sample space S of this random experiment and specify the 
probabilities of its elementary events. 


(b) Show the mapping from S to Sy, the range of X. 

(c) Find the probabilities for the various values of X. 

An information source produces binary triplets {000, 111, 010, 101, 001, 110, 100, 011} 
with corresponding probabilities {1/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/16}. A binary code 
assigns a codeword of length —log, px to triplet k. Let X be the length of the string as- 
signed to the output of the information source. 


(a) Show the mapping from S to Sy, the range of X. 

(b) Find the probabilities for the various values of X. 

An urn contains 9 $1 bills and one $50 bill. Let the random variable X be the total 
amount that results when two bills are drawn from the urn without replacement. 


(a) Describe the underlying space S of this random experiment and specify the proba- 
bilities of its elementary events. 


(b) Show the mapping from S to Sy, the range of X. 

(c) Find the probabilities for the various values of X. 

An urn contains 9 $1 bills and one $50 bill. Let the random variable X be the total 
amount that results when two bills are drawn from the urn with replacement. 


(a) Describe the underlying space S of this random experiment and specify the proba- 
bilities of its elementary events. 


(b) Show the mapping from S to Sy, the range of X. 

(c) Find the probabilities for the various values of X. 

A coin is tossed n times. Let the random variable Y be the difference between the num- 
ber of heads and the number of tails in the n tosses of a coin. Assume P[heads] = p. 

(a) Describe the sample space of S. 

(b) Find the probability of the event {Y = 0}. 

(c) Find the probabilities for the other values of Y. 

An m-bit password is required to access a system. A hacker systematically works through 


all possible m-bit patterns. Let X be the number of patterns tested until the correct pass- 
word is found. 


(a) Describe the sample space of S. 
(b) Show the mapping from S to Sy, the range of X. 
(c) Find the probabilities for the various values of X. 


Section 3.2: Discrete Random Variables and Probability Mass Function 


3.11. 


3.12. 


Let X be the maximum of the coin tosses in Problem 3.1. 


(a) Compare the pmf of X with the pmf of Y, the number of heads in two tosses of a fair 
coin. Explain the difference. 


(b) Suppose that Carlos uses a coin with probability of heads p = 3/4. Find the pmf 
of X. 


Consider an information source that produces binary pairs that we designate as 
Sy = {1,2,3, 4}. Find and plot the pmf in the following cases: 

(a) py = pi/k for all k in Sy. 

(b) Pk+1 = p,/2 fork = 2,3, 4. 
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3.13. 


3.14. 
3.15. 


3.16. 


3.17. 


3.18. 


3.19. 


3.20. 


Discrete Random Variables 


(© Pray = pyl2* for k = 2,3, 4. 

(d) Can the random variables in parts a, b, and c be extended to take on values in the set 
{1,2,...}? If yes, specify the pmf of the resulting random variables. If no, explain 
why not. 

Let X be a random variable with pmf p, = c/k? for k = 1,2,.... 

(a) Estimate the value of c numerically. Note that the series converges. 

(b) Find P[X > 4]. 

(c) Find P[6 = X = 8]. 

Compare P[X = 8] and P[Y = 8] for outputs of the data source in Problem 3.4. 

In Problem 3.5 suppose that terminal 1 transmits with probability 1/2 in a given time slot, 

but terminal 2 transmits with probability p. 

(a) Find the pmf for the number of transmissions X until a message gets through. 

(b) Given a successful transmission, find the probability that terminal 2 transmitted. 

(a) In Problem 3.7 what is the probability that the amount drawn from the urn is more 
than $2? More than $50? 

(b) Repeat part a for Problem 3.8. 

A modem transmits a +2 voltage signal into a channel. The channel adds to this signal a 

noise term that is drawn from the set {0,—1, —2, —3} with respective probabilities 

{4/10, 3/10, 2/10, 1/10}. 

(a) Find the pmf of the output Y of the channel. 

(b) What is the probability that the output of the channel is equal to the input of the 
channel? 

(c) What is the probability that the output of the channel is positive? 
A computer reserves a path in a network for 10 minutes. To extend the reservation the com- 
puter must successfully send a “refresh” message before the expiry time. However, mes- 
sages are lost with probability 1/2. Suppose that it takes 10 seconds to send a refresh 
request and receive an acknowledgment. When should the computer start sending refresh 
messages in order to have a 99% chance of successfully extending the reservation time? 
A modem transmits over an error-prone channel, so it repeats every “0” or “1” bit trans- 
mission five times. We call each such group of five bits a “codeword.” The channel 
changes an input bit to its complement with probability p = 1/10 and it does so indepen- 
dently of its treatment of other input bits. The modem receiver takes a majority vote of 
the five received bits to estimate the input signal. Find the probability that the receiver 
makes the wrong decision. 

Two dice are tossed and we let X be the difference in the number of dots facing up. 

(a) Find and plot the pmf of X. 

(b) Find the probability that |X| = k for all k. 


Section 3.3: Expected Value and Moments of Discrete Random Variable 


3.21. 


3.22. 


3.23. 


(a) In Problem 3.11, compare E[Y] to E[X] where X is the maximum of coin tosses. 

(b) Compare VAR[X] and VAR[Y]. 

Find the expected value and variance of the output of the information sources in Problem 
3.12, parts a, b, and c. 

(a) Find E[X] for the hex integers in Problem 3.4. 

(b) Find VAR[X]. 


3.24. 


3.25. 


3.26. 


3.27. 
3.28. 
3.29. 
3.30. 


3.31. 


3.32. 


3.33. 


3.34. 
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Find the mean codeword length in Problem 3.6. How can this average be interpreted in a 
very large number of encodings of binary triplets? 


(a) Find the mean and variance of the amount drawn from the urn in Problem 3.7. 
(b) Find the mean and variance of the amount drawn from the urn in Problem 3.8. 


Find E[Y] and VAR[Y] for the difference between the number of heads and tails in Problem 
3.9. In a large number of repetitions of this random experiment, what is the meaning of E[Y]? 


Find E[X] and VAR[X] in Problem 3.13. 
Find the expected value and variance of the modem signal in Problem 3.17. 
Find the mean and variance of the time that it takes to renew the reservation in Problem 3.18. 


The modem in Problem 3.19 transmits 1000 5-bit codewords. What is the average number 
of codewords in error? If the modem transmits 1000 bits individually without repetition, 
what is the average number of bits in error? Explain how error rate is traded off against 
transmission speed. 


(a) Suppose a fair coin is tossed n times. Each coin toss costs d dollars and the reward in 
obtaining X heads is aX? + bX. Find the expected value of the net reward. 


(b) Suppose that the reward in obtaining X heads is a*, where a > 0. Find the expected 
value of the reward. 


Let g(X) = I4, where A = {X > 10}. 

(a) Find E[g (X)] for X as in Problem 3.12a with Sy = {1,2,...,15}. 
(b) Repeat part a for X as in Problem 3.12b with Sy = {1,2,..., 15}. 
(c) Repeat part a for X as in Problem 3.12c with Sy = {1,2,..., 15}. 
Let g(X) = (X — 10)* (see Example 3.19). 

(a) Find E[X] for X as in Problem 3.12a with Sy = {1,2,...,15}. 

(b) Repeat part a for X as in Problem 3.12b with Sy = {1,2,..., 15}. 
(c) Repeat part a for X as in Problem 3.12c with Sy = {1,2,..., 15}. 


Consider the St. Petersburg Paradox in Example 3.16. Suppose that the casino has a total 
of M = 2” dollars, and so it can only afford a finite number of coin tosses. 


(a) How many tosses can the casino afford? 
(b) Find the expected payoff to the player. 
(c) How much should a player be willing to pay to play this game? 


Section 3.4: Conditional Probability Mass Function 


3.35. 


3.36. 


3.37. 


3.38. 


(a) In Problem 3.11a, find the conditional pmf of X, the maximum of coin tosses, given 
that X > 0. 


(b) Find the conditional pmf of X given that Michael got one head in two tosses. 
(c) Find the conditional pmf of X given that Michael got one head in the first toss. 
(d) In Problem 3.11b, find the probability that Carlos got the maximum given that X = 2. 


Find the conditional pmf for the quaternary information source in Problem 3.12, parts a, 
b, and c given that X < 4. 


(a) Find the conditional pmf of the hex integer X in Problem 3.4 given that X < 8. 
(b) Find the conditional pmf of X given that the first bit is 0. 
(c) Find the conditional pmf of X given that the 4th bit is 0. 


(a) Find the conditional pmf of X in Problem 3.5 given that no message gets through in 
time slot 1. 


(b) Find the conditional pmf of X given that the first transmitter transmitted in time slot 1. 
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3.39. 


3.40. 


3.41. 


3.42. 


3.43. 


Discrete Random Variables 


(a) Find the conditional expected value of X in Problem 3.5 given that no message gets 
through in the first time slot. Show that E[X | X > 1] = E[X] + 1. 


(b) Find the conditional expected value of X in Problem 3.5 given that a message gets 
through in the first time slot. 


(c) Find E[X] by using the results of parts a and b. 

(d) Find E[ X?] and VAR[X] using the approach in parts b and c. 

Explain why Eq. (3.31b) can be used to find E[ X7], but it cannot be used to directly find 

VAR[X]. 

(a) Find the conditional pmf for X in Problem 3.7 given that the first draw produced k 
dollars. 

(b) Find the conditional expected value corresponding to part a. 

(c) Find E[X] using the results from part b. 

(d) Find E[ X?] and VAR[X] using the approach in parts b and c. 


Find E[Y] and VAR[Y] for the difference between the number of heads and tails in n 
tosses in Problem 3.9. Hint: Condition on the number of heads. 


(a) In Problem 3.10 find the conditional pmf of X given that the password has not been 
found after k tries. 


(b) Find the conditional expected value of X given X > k. 
(c) Find E[X] from the results in part b. 


Section 3.5: Important Discrete Random Variables 


3.44. 


3.45. 


3.46. 


3.47. 


Indicate the value of the indicator function for the event A, I4(¢), for each ¢ in the sam- 
ple space S. Find the pmf and expected of 14. 


(a) S = {1,2,3,4,5} and A = {¢ > 3}. 
(b) S = [0,1] and A = {0.3 < ¢ =< 0.7}. 
(c) S= {f= (x, y):0<x<1,0< y< 1}and 
A = {¢ = (x, y):0.25 < x + y < 1.25}. 
(d) S = (—o0, co) and A = {¢ > a}. 
Let A and B be events for a random experiment with sample space S. Show that the 
Bernoulli random variable satisfies the following properties: 
(a) Is = land Ig = 0. 
(b) Lang = L4lp and Igug = [4 + Ip — Lal. 
(c) Find the expected value of the indicator functions in parts a and b. 


Heat must be removed from a system according to how fast it is generated. Suppose the 
system has eight components each of which is active with probability 0.25, independently 
of the others. The design of the heat removal system requires finding the probabilities of 
the following events: 


(a) None of the systems is active. 

(b) Exactly one is active. 

(c) More than four are active. 

(d) More than two and fewer than six are active. 

Eight numbers are selected at random from the unit interval. 


(a) Find the probability that the first four numbers are less than 0.25 and the last four 
are greater than 0.25. 


3.48. 


3.49. 


3.50. 


3.51. 


3.52. 


3.53. 
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(b) Find the probability that four numbers are less than 0.25 and four are greater than 0.25. 

(c) Find the probability that the first three numbers are less than 0.25, the next two are 
between 0.25 and 0.75, and the last three are greater than 0.75. 

(d) Find the probability that three numbers are less than 0.25, two are between 0.25 and 
0.75, and three are greater than 0.75. 

(e) Find the probability that the first four numbers are less than 0.25 and the last four 
are greater than 0.75. 

(f) Find the probability that four numbers are less than 0.25 and four are greater than 0.75. 

(a) Plot the pmf of the binomial random variable with n = 4 and n = 5, and 
p = 0.10, p = 0.5, and p = 0.90. 

(b) Use Octave to plot the pmf of the binomial random variable with n = 100 and 
p = 0.10, p = 0.5, and p = 0.90. 

Let X be a binomial random variable that results from the performance of n Bernoulli 

trials with probability of success p. 

(a) Suppose that X = 1. Find the probability that the single event occurred in the kth 
Bernoulli trial. 

(b) Suppose that X = 2. Find the probability that the two events occurred in the jth and 
kth Bernoulli trials where j < k. 

(c) In light of your answers to parts a and b in what sense are the successes distributed 
“completely at random” over the n Bernoulli trials? 

Let X be the binomial random variable. 

(a) Show that 


px(k+1) n-k p 
px(k) k+11-p 


where px(0) = (1 — p)”. 


(b) Show that part a implies that: (1) P[X = k] is maximum at kmax = [(n + 1)p], 
where [x] denotes the largest integer that is smaller than or equal to x; and (2) when 
(n + 1)p is an integer, then the maximum is achieved at kmax and kmax — 1. 

Consider the expression (a + b + c)”. 

(a) Use the binomial expansion for (a + b) and c to obtain an expression for (a + b + c)”. 

(b) Now expand all terms of the form (a + b)* and obtain an expression that in- 
volves the multinomial coefficient for M = 3 mutually exclusive events, 
Aj, A2, A3. 

(c) Let pı = P[ A], p = P[A2], p3 = P[A3]. Use the result from part b to show that 
the multinomial probabilities add to one. 

A sequence of characters is transmitted over a channel that introduces errors with prob- 

ability p = 0.01. 

(a) What is the pmf of N, the number of error-free characters between erroneous char- 
acters? 

(b) What is E[N]? 

(c) Suppose we want to be 99% sure that at least 1000 characters are received correctly 
before a bad one occurs. What is the appropriate value of p? 

Let N be a geometric random variable with Sy = {1,2,...}. 

(a) Find P[N = k|N = m]. 

(b) Find the probability that N is odd. 
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3.55. 


3.56. 


3.57. 


3.58. 


3.59. 


3.60. 
3.61. 
3.62. 


3.63. 


3.64. 


3.65. 
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Let M be a geometric random variable. Show that M satisfies the memoryless property: 
P[M = k + j|M = j+ 1] = P[M = k] forallj, k > 1. 

Let X be a discrete random variable that assumes only nonnegative integer values and 
that satisfies the memoryless property. Show that X must be a geometric random vari- 
able. Hint: Find an equation that must be satisfied by g(m) = P[M = m]. 


An audio player uses a low-quality hard drive. The initial cost of building the player is 

$50. The hard drive fails after each month of use with probability 1/12. The cost to repair 

the hard drive is $20. If a 1-year warranty is offered, how much should the manufacturer 

charge so that the probability of losing money on a player is 1% or less? What is the av- 

erage cost per player? 

A Christmas fruitcake has Poisson-distributed independent numbers of sultana raisins, 

iridescent red cherry bits, and radioactive green cherry bits with respective averages 48, 

24, and 12 bits per cake. Suppose you politely accept 1/12 of a slice of the cake. 

(a) What is the probability that you get lucky and get no green bits in your slice? 

(b) What is the probability that you get really lucky and get no green bits and two or 
fewer red bits in your slice? 

(c) What is the probability that you get extremely lucky and get no green or red bits and 
more than five raisins in your slice? 

The number of orders waiting to be processed is given by a Poisson random variable with 

parameter a = A/nu, where A is the average number of orders that arrive in a day, u is 

the number of orders that can be processed by an employee per day, and n is the number 

of employees. Let à = 5 and u = 1. Find the number of employees required so the prob- 

ability that more than four orders are waiting is less than 10%. What is the probability 

that there are no orders waiting? 

The number of page requests that arrive at a Web server is a Poisson random variable 

with an average of 6000 requests per minute. 

(a) Find the probability that there are no requests in a 100-ms period. 

(b) Find the probability that there are between 5 and 10 requests in a 100-ms period. 

Use Octave to plot the pmf of the Poisson random variable with a = 0.1, 0.75, 2, 20. 

Find the mean and variance of a Poisson random variable. 

For the Poisson random variable, show that for a < 1, P[N = k] is maximum at k = 0; 

fora > 1, P[N = k] is maximum at [a]; and if a is a positive integer, then P[ N = k] is 

maximum at k = a, and at k = a — 1. Hint: Use the approach of Problem 3.50. 

Compare the Poisson approximation and the binomial probabilities for k = 0,1, 2,3 and 

n = 10, p = 0.1;n = 20 and p = 0.05; and n = 100 and p = 0.01. 

At a given time, the number of households connected to the Internet is a Poisson random 

variable with mean 50. Suppose that the transmission bit rate available for the household 

is 20 Megabits per second. 


(a) Find the probability of the distribution of the transmission bit rate per user. 

(b) Find the transmission bit rate that is available to a user with probability 90% or 
higher. 

(c) What is the probability that a user has a share of 1 Megabit per second or higher? 

An LCD display has 1000 x 750 pixels. A display is accepted if it has 15 or fewer faulty 


pixels. The probability that a pixel is faulty coming out of the production line is 10°. Find 
the proportion of displays that are accepted. 


3.66. 


3.67. 


3.68. 


3.69. 


3.70. 


3.71. 


3.72. 
3.73. 
3.74. 


3.75. 


3.76. 


3.77. 
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A data center has 10,000 disk drives. Suppose that a disk drive fails in a given day with 

probability 107°. 

(a) Find the probability that there are no failures in a given day. 

(b) Find the probability that there are fewer than 10 failures in two days. 

(c) Find the number of spare disk drives that should be available so that all failures in a 
day can be replaced with probability 99%. 

A binary communication channel has a probability of bit error of 107°. Suppose that 

transmissions occur in blocks of 10,000 bits. Let N be the number of errors introduced by 

the channel in a transmission block. 

(a) Find P[N = 0], P[N = 3]. 

(b) For what value of p will the probability of 1 or more errors in a block be 99%? 

Find the mean and variance of the uniform discrete random variable that takes on values 

in the set {1,2,..., L} with equal probability. You will need the following formulas: 


n +1 n +1)(2n +1 
eer Foe e 
i=1 


i=1 


A voltage X is uniformly distributed in the set {—3,..., 3, 4}. 
(a) Find the mean and variance of X. 

(b) Find the mean and variance of Y = —2X? + 3. 

(c) Find the mean and variance of W = cos(mX/8). 

(d) Find the mean and variance of Z = cos*(7X/8). 


Ten news Web sites are ranked in terms of popularity, and the frequency of requests to 
these sites are known to follow a Zipf distribution. 


(a) What is the probability that a request is for the top-ranked site? 

(b) What is the probability that a request is for one of the bottom five sites? 
A collection of 1000 words is known to have a Zipf distribution. 

(a) What is the probability of the 10 top-ranked words? 

(b) What is the probability of the 10 lowest-ranked words? 

What is the shape of the log of the Zipf probability vs. the log of the rank? 
Plot the mean and variance of the Zipf random variable for L = 1 to L = 100. 


An online video store has 10,000 titles. In order to provide fast response, the store caches 
the most popular titles. How many titles should be in the cache so that with probability 
99% an arriving video request will be in the cache? 


(a) Income distribution is perfectly equal if every individual has the same income. What 
is the Lorenz curve in this case? 


(b) In a perfectly unequal income distribution, one individual has all the income and all 
others have none. What is the Lorenz curve in this case? 


Let X be a geometric random variable in the set {1, 2,... }. 
(a) Find the pmf of X. 

(b) Find the Lorenz curve of X. Assume L is infinite. 

(c) Plot the curve for p = 0.1, 0.5, 0.9. 

Let X be a zeta random variable with parameter a. 

(a) Find an expression for P[X = k]. 
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(b) Plot the pmf of X for a = 1.5, 2, and 3. 
(c) Plot P[X = k] for a = 1.5, 2, and 3. 


Section 3.6: Generation of Discrete Random Variables 


3.78. 


3.79. 


3.80. 
3.81. 


3.82. 


3.83. 


3.84. 


3.85. 


3.86. 


Octave provides function calls to evaluate the pmf of important discrete random vari- 

ables. For example, the function Poisson_pdf(x, lambda) computes the pmf at x for the 

Poisson random variable. 

(a) Plot the Poisson pmf for A = 0.5, 5, 50, as well as P[X =< k] and P[X > k]. 

(b) Plot the binomial pmf for = 48 and p = 0.10, 0.30, 0.50, 0.75, as well as P| X = k] 
and P[X > k]. 

(c) Compare the binomial probabilities with the Poisson approximation for n = 100, 
p = 0.01. 

The discrete_pdf function in Octave makes it possible to specify an arbitrary pmf for a 

specified Sy. 

(a) Plot the pmf for Zipf random variables with L = 10, 100, 1000, as well as P| X = k] 
and P[X > k]. 

(b) Plot the pmf for the reward in the St. Petersburg Paradox for m = 20 in Problem 3.34, as 
well as P[X = k] and P[X > k]. (You will need to use a log scale for the values of k.) 

Use Octave to plot the Lorenz curve for the Zipf random variables in Problem 3.79a. 


Repeat Problem 3.80 for the binomial random variable with n = 100 and p = 0.1, 0.5, 
and 0.9. 


(a) Use the discrete _rną function in Octave to simulate the urn experiment discussed in 
Section 1.3. Compute the relative frequencies of the outcomes in 1000 draws from the urn. 

(b) Use the discrete_pdf function in Octave to specify a pmf for a binomial random 
variable with n = 5 and p = 0.2. Use discrete_rnd to generate 100 samples and 
plot the relative frequencies. 

(c) Use binomial_rnd to generate the 100 samples in part b. 

Use the discrete_rnd function to generate 200 samples of the Zipf random vari- 

able in Problem 3.79a. Plot the sequence of outcomes as well as the overall relative 

frequencies. 

Use the discrete_rnd function to generate 200 samples of the St. Petersburg Paradox 

random variable in Problem 3.79b. Plot the sequence of outcomes as well as the overall 

relative frequencies. 


Use Octave to generate 200 pairs of numbers, ( X;, Y;), in which the components are inde- 
pendent, and each component is uniform in the set {1, 2,...,9, 10}. 


(a) Plot the relative frequencies of the X and Y outcomes. 

(b) Plot the relative frequencies of the random variable Z = X + Y. Can you discern 
the pmf of Z? 

(c) Plot the relative frequencies of W = XY. Can you discern the pmf of Z? 

(d) Plot the relative frequencies of V = X/Y. Is the pmf discernable? 

Use Octave function binomial_rnd to generate 200 pairs of numbers, (Xj, Y;), in which 

the components are independent, and where X; are binomial with parameter 

n = 8, p = 0.5 and Y; are binomial with parameter n = 4, p = 0.5. 
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(a) Plot the relative frequencies of the X and Y outcomes. 

(b) Plot the relative frequencies of the random variable Z = X + Y. Does this corre- 
spond to the pmf you would expect? Explain. 

Use Octave function Poisson_xrnd to generate 200 pairs of numbers, (X;, Y;), in which 

the components are independent, and where X; are the number of arrivals to a system in 

one second and Y; are the number of arrivals to the system in the next two seconds. As- 

sume that the arrival rate is five customers per second. 

(a) Plot the relative frequencies of the X and Y outcomes. 

(b) Plot the relative frequencies of the random variable Z = X + Y. Does this corre- 
spond to the pmf you would expect? Explain. 


Problems Requiring Cumulative Knowledge 


3.88. 


3.89. 


3.90. 


3.91. 


The fraction of defective items in a production line is p. Each item is tested and defective 

items are identified correctly with probability a. 

(a) Assume nondefective items always pass the test. What is the probability that k items 
are tested until a defective item is identified? 

(b) Suppose that the identified defective items are removed. What proportion of the 
remaining items is defective? 

(c) Now suppose that nondefective items are identified as defective with probability b. 
Repeat part b. 


A data transmission system uses messages of duration T seconds. After each message 
transmission, the transmitter stops and waits T seconds for a reply from the receiver. The re- 
ceiver immediately replies with a message indicating that a message was received correctly. 
The transmitter proceeds to send a new message if it receives a reply within T seconds; oth- 
erwise, it retransmits the previous message. Suppose that messages can be completely gar- 
bled while in transit and that this occurs with probability p. Find the maximum possible rate 
at which messages can be successfully transmitted from the transmitter to the receiver. 


An inspector selects every nth item in a production line for a detailed inspection. Sup- 
pose that the time between item arrivals is an exponential random variable with mean 1 
minute, and suppose that it takes 2 minutes to inspect an item. Find the smallest value of 
n such that with a probability of 90% or more, the inspection is completed before the ar- 
rival of the next item that requires inspection. 

The number X of photons counted by a receiver in an optical communication system is a 
Poisson random variable with rate A; when a signal is present and a Poisson random variable 
with rate Ay < A, when a signal is absent. Suppose that a signal is present with probability p. 
(a) Find P[signal present| X = k] and P[signal absent| X = k]. 

(b) The receiver uses the following decision rule: 


If P[signal present | X = k] > P[signal absent| X = k], decide signal present; 
otherwise, decide signal absent. 


Show that this decision rule leads to the following threshold rule: 


If X > T, decide signal present; otherwise, decide signal absent. 


(c) What is the probability of error for the above decision rule? 


140 Chapter 3 Discrete Random Variables 


3.92. A binary information source (e.g.,a document scanner) generates very long strings of 0’s fol- 
lowed by occasional 1’s. Suppose that symbols are independent and that p = P[symbol = 0] 
is very close to one. Consider the following scheme for encoding the run X of 0’s between 
consecutive 1’s: 

1. If X = n, express n as a multiple of an integer M = 2” and a remainder r, that is, find 
k andr such that n = kM + r,whereO=Sr< M —-1; 

2. The binary codeword for n then consists of a prefix consisting of k 0’s followed by a 1, 
and a suffix consisting of the m-bit representation of the remainder r. The decoder can 
deduce the value of n from this binary string. 

(a) Find the probability that the prefix has k zeros, assuming that p“ = 1/2. 
(b) Find the average codeword length when p™ = 1/2. 


(c) Find the compression ratio, which is defined as the ratio of the average run length 
to the average codeword length when p™ = 1/2. 


CHAPTER 


One Random Variable 4 


4.1 


In Chapter 3 we introduced the notion of a random variable and we developed meth- 
ods for calculating probabilities and averages for the case where the random variable is 
discrete. In this chapter we consider the general case where the random variable may 
be discrete, continuous, or of mixed type. We introduce the cumulative distribution 
function which is used in the formal definition of a random variable, and which can 
handle all three types of random variables. We also introduce the probability density 
function for continuous random variables. The probabilities of events involving a ran- 
dom variable can be expressed as integrals of its probability density function. The ex- 
pected value of continuous random variables is also introduced and related to our 
intuitive notion of average. We develop a number of methods for calculating probabil- 
ities and averages that are the basic tools in the analysis and design of systems that in- 
volve randomness. 


THE CUMULATIVE DISTRIBUTION FUNCTION 


The probability mass function of a discrete random variable was defined in terms of 
events of the form {X = b}. The cumulative distribution function is an alternative ap- 
proach which uses events of the form {X = b}. The cumulative distribution function 
has the advantage that it is not limited to discrete random variables and applies to all 
types of random variables. We begin with a formal definition of a random variable. 


Definition: Consider a random experiment with sample space S and event 
class F. A random variable X is a function from the sample space S to R with 
the property that the set A, = {¢: X(¢) = b} isin F for every bin R. 


The definition simply requires that every set A, have a well defined probability in 
the underlying random experiment, and this is not a problem in the cases we will consider. 
Why does the definition use sets of the form {f: X() = b} and not {f: X(¢) = xp}? 
We will see that all events of interest in the real line can be expressed in terms of sets of 
the form {¢: X(f) = b}. 

The cumulative distribution function (cdf) of a random variable X is defined as 
the probability of the event {X =< x}: 


Fy(x) = P[X = x] for ~œ < x < +00, (4.1) 
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that is, it is the probability that the random variable X takes on a value in the set 
(—co, x]. In terms of the underlying sample space, the cdf is the probability of the 
event {¢: X(£) = x}. The event {X = x} and its probability vary as x is varied; in 
other words, Fy(x) is a function of the variable x. 

The cdf is simply a convenient way of specifying the probability of all semi-infi- 
nite intervals of the real line of the form (— 09, b]. The events of interest when dealing 
with numbers are intervals of the real line, and their complements, unions, and inter- 
sections. We show below that the probabilities of all of these events can be expressed in 
terms of the cdf. 

The cdf has the following interpretation in terms of relative frequency. Suppose 
that the experiment that yields the outcome ¢, and hence X(¢), is performed a large 
number of times. Fy(b) is then the long-term proportion of times in which X(Z) = b. 

Before developing the general properties of the cdf, we present examples of the 
cdfs for three basic types of random variables. 


Example 4.1 Three Coin Tosses 


Figure 4.1(a) shows the cdf X, the number of heads in three tosses of a fair coin. From Example 3.1 
we know that X takes on only the values 0, 1,2, and 3 with probabilities 1/8, 3/8, 3/8, and 1/8, respec- 
tively, so Fy(x) is simply the sum of the probabilities of the outcomes from {0, 1, 2, 3} that are less 
than or equal to x. The resulting cdf is seen to be a nondecreasing staircase function that grows from 
0 to 1. The cdf has jumps at the points 0, 1, 2,3 of magnitudes 1/8, 3/8, 3/8, and 1/8, respectively. 


Let us take a closer look at one of these discontinuities, say, in the vicinity of 
x = 1. For ô a small positive number, we have 


Fy(1 — 6) = P[X = 1 — 6] = P{0 heads} = z 


so the limit of the cdf as x approaches 1 from the left is 1/8. However, 
1 3 1 
Fy(1) = P[X = 1] = P[0 or 1 heads] = — + 5 = >, 
8 8 2 

and furthermore the limit from the right is 


1 
Fy(1 + 6) = P[X = 1 + 6] = P[0 or 1 heads] = 7 


Fy(x) — kw 
s 
34 3 
1 8 8 1 
— „allau 
0 h 2 3 0 1 2 3 
(a) (b) 
FIGURE 4.1 


cdf (a) and pdf (b) of a discrete random variable. 
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Thus the cdf is continuous from the right and equal to 1/2 at the point x = 1. Indeed, 
we note the magnitude of the jump at the point x = 1 is equal to P[X = 1] = 1/2 
— 1/8 = 3/8. Henceforth we will use dots in the graph to indicate the value of the cdf at 
the points of discontinuity. 

The cdf can be written compactly in terms of the unit step function: 


0 for x < 0 
oo fi for x = 0, on 
then 
1 3 3 1 
Fy(x) = u(x) + glx 1) + glx 2) + <u(x — 3) 


Example 4.2 Uniform Random Variable in the Unit Interval 
Spin an arrow attached to the center of a circular board. Let 6 be the final angle of the arrow, 
where 0 < @ = 27. The probability that 0 falls in a subinterval of (0, 277] is proportional to 
the length of the subinterval. The random variable X is defined by X(6) = 0/277. Find the cdf 
= As 0 increases from 0 to 27, X increases from 0 to 1. No outcomes 0 lead to values x = 0, so 
Fy(x) = P[X = x] = PØ] =0 for x < 0. 

For 0 < x = 1, {X < x} occurs when {0 = 27x} so 

Fy(x) = P[X <= x] = P[{@ < 27x}] = 27x/27 = x 0<x<1. (4.3) 
Finally, for x > 1, all outcomes @ lead to {X(0) = 1 < x}, therefore: 


Fy(x) = P[X <x] =P[0<6@<2m]=1 forx>1. 


We say that X is a uniform random variable in the unit interval. Figure 4.2(a) shows the cdf 
of the general uniform random variable X. We see that Fy(x) is a nondecreasing continuous 
function that grows from 0 to 1 as x ranges from its minimum values to its maximum values. 


Fy(x) Sx) 


b-a 


> X | > X 


i 
T 
a b a b 


FIGURE 4.2 
cdf (a) and pdf (b) of a continuous random variable. 
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Example 4.3 


The waiting time X of a customer at a taxi stand is zero if the customer finds a taxi parked at the 
stand, and a uniformly distributed random length of time in the interval [0,1] (in hours) if no 
taxi is found upon arrival. The probability that a taxi is at the stand when the customer arrives is 
p. Find the cdf of X. 

The cdf is found by applying the theorem on total probability: 


Fy(x) = P[X < x] = P[X < x|find taxi]jp + P[X <= x|no taxi](1 — p). 


Note that P[ X = x| find taxi] = 1 when x = 0 and 0 otherwise. Furthermore P[ X = x|no taxi] 
is given by Eq. (4.3), therefore 


0 x <0 
Fy(x)= 4 p+ (1- p)x 0sxs1 
1 x1. 


The cdf, shown in Fig. 4.3(a), combines some of the properties of the cdf in Example 4.1 
(discontinuity at 0) and the cdf in Example 4.2 (continuity over intervals). Note that Fy(x) can 
be expressed as the sum of a step function with amplitude p and a continuous function of x. 


We are now ready to state the basic properties of the cdf. The axioms of probabil- 
ity and their corollaries imply that the cdf has the following properties: 


@ 0 <s Fy(x) = 1. 
(ii) lim F(x) =1. 
(iii) “Tim, F(x) = 0. 
(iv) F(x) is a nondecreasing function of x, that is, ifa < b, then Fy(a) = Fy(b). 
(v) i from the right, that is, for h > 0, Fy(b) = lim Fy(b + h) 
= Fy(b"). 


These five properties confirm that, in general, the cdf is a nondecreasing function that 
grows from 0 to 1 as x increases from —©o to 00. We already observed these properties 
in Examples 4.1, 4.2, and 4.3. Property (v) implies that at points of discontinuity, the cdf 


F(x) Axx) 


l-—p 
p 


FIGURE 4.3 
cdf (a) and pdf (b) of a random variable of mixed type. 
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is equal to the limit from the right. We observed this property in Examples 4.1 and 4.3. 
In Example 4.2 the cdf is continuous for all values of x, that is, the cdf is continuous both 
from the right and from the left for all x. 

The cdf has the following properties which allow us to calculate the probability of 
events involving intervals and single values of X: 


(vi) Pla < X = b] = Fy(b) — Fy(a). 

(vii) PLX = b] = Fy(b) — F(b). 
(viii) PLX > x] =1— Fy(x). 
Property (vii) states that the probability that X = b is given by the magnitude of the 
jump of the cdf at the point b. This implies that if the cdf is continuous at a point b, then 
P[X = b] = 0. Properties (vi) and (vii) can be combined to compute the probabilities 


of other types of intervals. For example, since {a = X = b} = {X = a} U {a < X 
< b}, then 


Plas X =b] 


P[X =a] + Pļa< X =b] 
= Fy(a) — Fx(a ) + Fx(b) — Fy(a) = F(b) — Fx(a ). (4.4) 
If the cdf is continuous at the endpoints of an interval, then the endpoints have zero 


probability, and therefore they can be included in, or excluded from, the interval with- 
out affecting the probability. 


Example 4.4 


Let X be the number of heads in three tosses of a fair coin. Use the cdf to find the probability of 
the events A = {1 < X =< 2}, B = {055 X < 2.5}, and C = {15 X < 2}. 
From property (vi) and Fig. 4.1 we have 


Pll < X = 2] = Fy(2) — Fy(1) = 7/8 — 1/2 = 3/8. 


The cdf is continuous at x = 0.5 and x = 2.5, so 
P[0.5 = X < 2.5] = Fy(2.5) — Fy(0.5) = 7/8 — 1/8 = 6/8. 
Since {1 = X < 2} U{X = 2} = {1 = X = 2}, from Eq. (4.4) we have 
P{1 < X < 2] + P[X = 2] = Fy(2) — Fy(T), 
and using property (vii) for P[ X = 2]: 
P{1 = X < 2] = Fg(2) — F(T) — PLX = 2] = Fe(2) - F(T) — (Fx(2) - Fx(27)) 
= Fy(2°) — F(T) = 4/8 — 1/8 = 3/8. 


Example 4.5 


Let X be the uniform random variable from Example 4.2. Use the cdf to find the probability of 
the events {—0.5 < X < 0.25}, {0.3 < X < 0.65}, and {|X — 0.4| > 0.2}. 
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The cdf of X is continuous at every point so we have: 

P[-0.5 < X = 0.25] = Fy(0.25) — Fy(—0.5) = 0.25 — 0 = 0.25, 

P[0.3 < X < 0.65] = Fy(0.65) — Fy(0.3) = 0.65 — 0.3 = 0.35, 

P[|X — 0.4] > 0.2] = P[{X < 0.2} U{X > 0.6] = P[X < 0.2] + P[X > 0.6] 
= Fy(0.2) + (1 — Fy(0.6)) = 0.2 + 0.4 = 0.6. 


We now consider the proof of the properties of the cdf. 


e Property (i) follows from the fact that the cdf is a probability and hence must sat- 
isfy Axiom I and Corollary 2. 

e To obtain property (iv), we note that the event {X = a} is a subset of {X = b}, 
and so it must have smaller or equal probability (Corollary 7). 


e To show property (vi), we note that {X =< b} can be expressed as the union of 
mutually exclusive events: {X = a}U {a < X =b} = {X = b}, and so by 
Axiom III, Fy(a) + Pla < X = b] = Fy(b). 

e Property (viii) follows from {X > x} = {X < x}* and Corollary 1. 


While intuitively clear, properties (ii), (iii), (v), and (vii) require more advanced limit- 
ing arguments that are discussed at the end of this section. 


The Three Types of Random Variables 


The random variables in Examples 4.1, 4.2, and 4.3 are typical of the three most basic 
types of random variable that we are interested in. 

Discrete random variables have a cdf that is a right-continuous, staircase function 
of x, with jumps at a countable set of points xo, x1, X2,.... The random variable in 
Example 4.1 is a typical example of a discrete random variable. The cdf Fy(x) of a dis- 
crete random variable is the sum of the probabilities of the outcomes less than x and 
can be written as the weighted sum of unit step functions as in Example 4.1: 


Fx(x) = > px(x;) = > px(xı)u(x = Xk), (4.5) 


ESX 


where the pmf py(x,) = P[X = x] gives the magnitude of the jumps in the cdf. We 
see that the pmf can be obtained from the cdf and vice versa. 

A continuous random variable is defined as a random variable whose cdf Fy(x) 
is continuous everywhere, and which, in addition, is sufficiently smooth that it can be 
written as an integral of some nonnegative function f(x): 


Fy(x) = i S(t) at (4.6) 


The random variable discussed in Example 4.2 can be written as an integral of the function 
shown in Fig. 4.2(b). The continuity of the cdf and property (vii) implies that continuous 


*4,1.2 
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random variables have P| X = x] = 0 for all x. Every possible outcome has probability 
zero! An immediate consequence is that the pmf cannot be used to characterize the proba- 
bilities of X. A comparison of Eqs. (4.5) and (4.6) suggests how we can proceed to charac- 
terize continuous random variables. For discrete random variables, (Eq. 4.5), we calculate 
probabilities as summations of probability masses at discrete points. For continuous ran- 
dom variables, (Eq. 4.6), we calculate probabilities as integrals of “probability densities” 
over intervals of the real line. 

A random variable of mixed type is a random variable with a cdf that has jumps 
on a countable set of points xo, x;, X2,..., but that also increases continuously over at 
least one interval of values of x. The cdf for these random variables has the form 


Fx(x) = pFi(x) + (1 — p) A(x), 


where 0 < p < 1, and F(x) is the cdf of a discrete random variable and F(x) is the cdf 
of a continuous random variable. The random variable in Example 4.3 is of mixed type. 

Random variables of mixed type can be viewed as being produced by a two-step 
process: A coin is tossed; if the outcome of the toss is heads, a discrete random variable 
is generated according to F,(x); otherwise, a continuous random variable is generated 
according to P(x). 


Fine Point: Limiting properties of cdf 
Properties (ii), (iii), (v), and (vii) require the continuity property of the probability 
function discussed in Section 2.9. For example, for property (ii), we consider the se- 
quence of events {X < n} which increases to include all of the sample space S as n ap- 
proaches ©, that is, all outcomes lead to a value of X less than infinity. The continuity 
property of the probability function (Corollary 8) implies that: 

lim Fy(n) = lim PLX <n] = P| lim {X = n}] = P[S] = 1. 

noo noo no 
For property (iii), we take the sequence {X =< —n} which decreases to the empty set 
©, that is, no outcome leads to a value of X less than —©o: 


lim Fy(—n) = lim P[X S —n] = P| lim {X = —n}] = P[Ø] = 0. 
For property (v), we take the sequence of events {X = x + 1/n} which decreases to 


{X = x} from the right: 


lim Fy(x + 1/n) = lim P[X = x + 1/n] 


= Pl lim {X = x + 1/n}] = P[{X = x}] = Fe(2). 


Finally, for property (vii), we take the sequence of events, {b — 1/n < X = b} which 
decreases to {b} from the left: 


lim (Fy(b) — Fy(b — Un)) = lim P[b — In < X = b] 


= PĪ lim {b — 1/n < X = b}] = P[X = b]. 
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THE PROBABILITY DENSITY FUNCTION 
The probability density function of X (pdf), if it exists, is defined as the derivative of 
Fy(x): 


dFy(x 
f(x) = FO), (4.7) 


In this section we show that the pdf is an alternative, and more useful, way of specify- 
ing the information contained in the cumulative distribution function. 

The pdf represents the “density” of probability at the point x in the following 
sense: The probability that X is in a small interval in the vicinity of x—that is, {x < X 
< x + h}—is 


P|x< X = x+ h] = Fy(x +h) — Fy(x) 


_ Fy(x + h) - Ba) 


4.8 
5 (48) 
If the cdf has a derivative at x, then as h becomes very small, 

Plx< Xs x+h] = fy(x)h. (4.9) 


Thus f(x) represents the “density” of probability at the point x in the sense that the prob- 
ability that X is in a small interval in the vicinity of x is approximately fy (x)h. The deriva- 
tive of the cdf, when it exists, is positive since the cdf is a nondecreasing function of x, thus 


(i) fy(x) = 0. (4.10) 


Equations (4.9) and (4.10) provide us with an alternative approach to specifying 
the probabilities involving the random variable X. We can begin by stating a nonnega- 
tive function f(x), called the probability density function, which specifies the proba- 
bilities of events of the form “X falls in a small interval of width dx about the point x,” 
as shown in Fig. 4.4(a). The probabilities of events involving X are then expressed in 
terms of the pdf by adding the probabilities of intervals of width dx. As the widths of 
the intervals approach zero, we obtain an integral in terms of the pdf. For example, the 
probability of an interval [a, b] is 


b 
(ii) Pla< X <b] = 1 fx(x) dx. (4.11) 


The probability of an interval is therefore the area under fy(x) in that interval, as shown 

in Fig. 4.4(b). The probability of any event that consists of the union of disjoint inter- 

vals can thus be found by adding the integrals of the pdf over each of the intervals. 
The cdf of X can be obtained by integrating the pdf: 


dii) Fy(x) = J _fx(0) at (4.12) 


In Section 4.1, we defined a continuous random variable as a random variable X whose 
cdf was given by Eq. (4.12). Since the probabilities of all events involving X can be 
written in terms of the cdf, it then follows that these probabilities can be written in 
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Sx@ 4 Fx) 4 


> X > X 
x x +dx a b 


Pix < X < x + dx] =fy@dx P[a < X < b] = fl fx(x)dx 
(a) (b) 


FIGURE 4.4 
(a) The probability density function specifies the probability of intervals of infinitesimal width. (b) The probability of an 
interval [a, b] is the area under the pdf in that interval. 


terms of the pdf. Thus the pdf completely specifies the behavior of continuous random 


variables. 
By letting x tend to infinity in Eq. (4.12), we obtain a normalization condition for 
pdf's: 
+00 
(iv) 1 = | f(t) dt. (4.13) 


The pdf reinforces the intuitive notion of probability as having attributes similar 
to “physical mass.” Thus Eq. (4.11) states that the probability “mass” in an interval is 
the integral of the “density of probability mass” over the interval. Equation (4.13) 
states that the total mass available is one unit. 

A valid pdf can be formed from any nonnegative, piecewise continuous function 
g(x) that has a finite integral: 


[. g(x)dx =c < ow, (4.14) 


By letting fy(x) = g(x)/c, we obtain a function that satisfies the normalization condi- 
tion. Note that the pdf must be defined for all real values of x; if X does not take on val- 
ues from some region of the real line, we simply set fy(x) = 0 in the region. 


Example 4.6 Uniform Random Variable 


The pdf of the uniform random variable is given by: 


1 
asxzb 


fx(x)= b-a (4.15a) 
0 x<a and x>b 
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and is shown in Fig. 4.2(b). The cdf is found from Eq. (4.12): 


0 x<a 
Fy(x) = = a<x<b (4.15b) 
1 x>b 


The cdf is shown in Fig. 4.2(a). 


Example 4.7 Exponential Random Variable 


The transmission time X of messages in a communication system has an exponential distrib- 


ution: 
P[X >x])=e™ x>0. 
Find the cdf and pdf of X. 
The cdf is given by Fy(x) = 1 — P[X > x] 
0 x<0 
F(x) = {° jet ee (4.16a) 


The pdf is obtained by applying Eq. (4.7): 


fx(x) = Fx(x) = (4.16b) 


Example 4.8 Laplacian Random Variable 


The pdf of the samples of the amplitude of speech waveforms is found to decay exponentially at 
arate a, so the following pdf is proposed: 


fx(x) = cel =- < x < œ. (4.17) 


Find the constant c, and then find the probability P[| X| < v]. 


We use the normalization condition in (iv) to find c: 


[oe CO 2 
1= f cel dx = 2f ce “dx = ae 
—00 0 a 


Therefore c = a/2. The probability P[|X| < v] is found by integrating the pdf: 


v v 
P[|X| <v] = l gad dx = {2)/ on dx = 1-6. 
—v 0 


4.2.1 pdf of Discrete Random Variables 


The derivative of the cdf does not exist at points where the cdf is not continuous. Thus 
the notion of pdf as defined by Eq. (4.7) does not apply to discrete random variables 
at the points where the cdf is discontinuous. We can generalize the definition of the 
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probability density function by noting the relation between the unit step function and 
the delta function. The unit step function is defined as 


u(x) = h E (4.18a) 
The delta function 6(t) is related to the unit step function by the following equation: 
u(x) = [20 dt. (4.18b) 

A translated unit step function is then: 
u(x — xo) = f so dt = fae — xo) dt’. (4.18c) 


Substituting Eq. (4.18c) into the cdf of a discrete random variables: 


x 


Fela) = Zpxtaadutx— x) = Doxa f 8- xo) di 


= i Di px(xx)ô(t — x) dt. (4.19) 
This suggests that we define the pdf for a discrete random variable by 
d 
fx(x) = zx) = Di Px(%4) (x = Xx): (4.20) 


Thus the generalized definition of pdf places a delta function of weight P| X = x,] at 
the points x, where the cdf is discontinuous. 

To provide some intuition on the delta function, consider a narrow rectangular 
pulse of unit area and width A centered at t = 0: 


ye 1/A —A/2 < t < A/2 
TAIE 0 It] > A. 


Consider the integral of m4(t): 
i Talt) dt = / Odt =0 for x < —A/2 
J Ta(t) dt = u(x). (4.21) 
—00 x A/2 
J Talt) dt = | 1/A dt =1 forx > A/2 
00 -A/2 


As A — 0, we see that the integral of the narrow pulse approaches the unit step func- 
tion. For this reason, we visualize the delta function 6(t) as being zero everywhere 
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except at x = 0 where it is unbounded. The above equation does not apply at the value 
x = 0. To maintain the right continuity in Eq. (4.18a), we use the convention: 


0 
u(0) =1= / 6(t) dt. 
If we replace z(t) in the above derivation with g(t)7,(t), we obtain the “sifting” 
property of the delta function: 

g(0) = / g(t)d(t) dt and g(xo) = J g(t)d(t — xo) dt. (4.22) 
The delta function is viewed as sifting through x and picking out the value of g at the 
point where the delta functions is centered, that is, g(x) for the expression on the right. 
The pdf for the discrete random variable discussed in Example 4.1 is shown in 
Fig. 4.1(b). The pdf of a random variable of mixed type will also contain delta functions 
at the points where its cdf is not continuous. The pdf for the random variable discussed 

in Example 4.3 is shown in Fig. 4.3(b). 


Example 4.9 


Let X be the number of heads in three coin tosses as in Example 4.1. Find the pdf of X. Find 
P[1 < X = 2] and P[2 = X < 3] by integrating the pdf. 
In Example 4.1 we found that the cdf of X is given by 


1 3 3 1 
Fy(x) gut) zre 1) 4 z” 2) +4 glx 3) 
It then follows from Eqs. (4.18) and (4.19) that 
f(x) Lala) H a(x 1) 4 a(x 2) 4 1 scx 3) 


When delta functions appear in the limits of integration, we must indicate whether the delta 
functions are to be included in the integration. Thus in P[1 < X = 2] = P[X in (1,2]], the 
delta function located at 1 is excluded from the integral and the delta function at 2 is included: 


2+ 


Pil< X =2]= fx(x)dx = 2 
a 8 
Similarly, we have that 
S 3 
Pi2s X < 3] = ; fx(x) dx = z. 


Conditional cdf’s and pdf's 


Conditional cdf’s can be defined in a straightforward manner using the same approach 

we used for conditional pmf’s. Suppose that event C is given and that P[C] > 0. The 

conditional cdf of X given C is defined by 

P|{X = x} NC] 
P[C] 


Fy(x|C) = if PIC] > 0. (4.23) 
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It is easy to show that Fy(x|C) satisfies all the properties of a cdf. (See Problem 4.29.) 
The conditional pdf of X given C is then defined by 


fa(x10) = Fx(xIC). (4.24) 


Example 4.10 


The lifetime X of a machine has a continuous cdf Fy(x). Find the conditional cdf and pdf given 
the event C = {X > t} (i.e., “machine is still working at time £”). 
The conditional cdf is 


PHX <x} N{X > t] 


Fy(x|X > t) = P[X sx|X > t] = P[X >t] 


The intersection of the two events in the numerator is equal to the empty set when x < t and to 
{t < X = x} when x = t. Thus 


0 xst 
Fy(xlX > t) = re 
TPX 


The conditional pdf is found by differentiating with respect to x: 


fx(x) 


X= t: 


Now suppose that we have a partition of the sample space S into the union of dis- 
joint events B,, By,..., B,. Let Fy(x|B;) be the conditional cdf of X given event B;. 
The theorem on total probability allows us to find the cdf of X in terms of the condi- 
tional cdf’s: 


Fy(x) = P[X = x] = Siz < x| B;]P[B;] = > Fe(x| B) PLB] (4.25) 


The pdf is obtained by differentiation: 


felix) = RO Sna]. (4.26) 


dx i=1 


Example 4.11 


A binary transmission system sends a “0” bit by transmitting a —v voltage signal, and a “1” bit by 
transmitting a +v. The received signal is corrupted by Gaussian noise and given by: 


Y=X+N 


where X is the transmitted signal, and N is a noise voltage with pdf fy(x). Assume that 
P[“1”] = p = 1 — P[“0”]. Find the pdf of Y. 
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Let Bo be the event “0” is transmitted and B, be the event “1” is transmitted, then Bọ, B, 
form a partition, and 


Fy(x) = Fy(x| Bo)[ Bo] + Fy(x| By)[ Bi] 
= P[Y < x| X = —v](1 - p) + P[Y = x| X = v)p. 


Since Y = X + N, the event {Y < x|X = v} is equivalent to {v + N < x}and {N < x — v}, 
and the event {Y < x|X = —v} is equivalent to {N < x + v}. Therefore the conditional 
cdf’s are: 


Fy(x|Bo) = P[N = x + v] = Fy(x + v) 
and 
Fg(x| Bı) = PIN = x - v] = Fy(x — v). 


The cdf is: 
Fy(x) = Fy(x + v)(1 — p) + Fy(x — v)p 
The pdf of N is then: 
d 
frx) = dx y(x) 
d d 
= A ENO t v)\(1 -= p)4 A ENG v)p 


= fy(x + v)(1 — p) + fy(x — v)p. 


The Gaussian random variable has pdf: 
1 —x7/207 
fy(x) = S~ Se? -%0 < x < Ow, 


The conditional pdfs are: 


1 

fy(x|Bo) = fy(x + v) = erste) 20? 
210" 

fx + o) fr 2) 


FIGURE 4.5 
The conditional pdfs given the input signal 
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and 
1 
fy(x|Bi) = f(x — v) = eee 
2T? 
The pdf of the received signal Y is then: 
fy(x) = 1 etne = p) E 1 eo (80) 20? : 
2mo? 2mo? 


Figure 4.5 shows the two conditional pdfs. We can see that the transmitted signal X shifts the cen- 
ter of mass of the Gaussian pdf. 


THE EXPECTED VALUE OF X 


We discussed the expected value for discrete random variables in Section 3.3, and found 
that the sample mean of independent observations of a random variable approaches 
E[ X]. Suppose we perform a series of such experiments for continuous random vari- 
ables. Since continuous random variables have P| X = x] = 0 for any specific value 
of x, we divide the real line into small intervals and count the number of times N,(7n) 
the observations fall in the interval {x, < X < x, + A}. As n becomes large, then the 
relative frequency f,(n) = N,(n)/n will approach fy(x,)A, the probability of the inter- 
val. We calculate the sample mean in terms of the relative frequencies and let n > ov: 


(X) = Z xf) 4 D xfx (tA. 


The expression on the right-hand side approaches an integral as we decrease A. 
The expected value or mean of a random variable X is defined by 


E[X] = G dt. (4.27) 


The expected value E[X] is defined if the above integral converges absolutely, that is, 


+00 


ext) = f Itlfy(t) dt <oo, 


If we view fx(x) as the distribution of mass on the real line, then ELX] represents the 
center of mass of this distribution. 

We already discussed E[X] for discrete random variables in detail, but it is worth 
noting that the definition in Eq. (4.27) is applicable if we express the pdf of a discrete 
random variable using delta functions: 
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Example 4.12 Mean of a Uniform Random Variable 


The mean for a uniform random variable is given by 


at+b 


b 
ELX] = (b= a) f rdr = ra 


which is exactly the midpoint of the interval [a, b]. The results shown in Fig. 3.6 were obtained by 
repeating experiments in which outcomes were random variables Y and X that had uniform cdf’s 
in the intervals [—1, 1] and [3, 7], respectively. The respective expected values, 0 and 5, corre- 
spond to the values about which X and Y tend to vary. 


The result in Example 4.12 could have been found immediately by noting that 
E[X] = m when the pdf is symmetric about a point m. That is, if 


fx(m — x) = fy(m + x) for all x, 
then, assuming that the mean exists, 


0= i (m — t)fy(t) dt = m — T tfy(t) dt. 


CO 


The first equality above follows from the symmetry of f(t) about t = m and the odd 
symmetry of (m — t) about the same point. We then have that E[ X] = m. 


Example 4.13 Mean of a Gaussian Random Variable 


The pdf of a Gaussian random variable is symmetric about the point x = m. Therefore E[ X] = m. 


The following expressions are useful when X is a nonnegative random variable: 


E[X] = [ (1 — Fy(t)) dt if X continuous and nonnegative (4.28) 


and 
œO 


E| X] = ZPA >k] if X nonnegative, integer-valued. (4.29) 


The derivation of these formulas is discussed in Problem 4.47. 


Example 4.14 Mean of Exponential Random Variable 


The time X between customer arrivals at a service station has an exponential distribution. Find 
the mean interarrival time. 
Substituting Eq. (4.17) into Eq. (4.27) we obtain 


zx = f the™ dt. 
0 


4.3.1 
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We evaluate the integral using integration by parts (fudv = uv — fvdu), with u = t and 
dv = he™ dt: 


E[ X] = -te™ 


ooe 1 1 
= lim + ; 
t>% À À À 


where we have used the fact that e™ and te™ go to zero as t approaches infinity. 
For this example, Eq. (4.28) is much easier to evaluate: 


E[X] = e™ dt = a 
0 À 


Recall that àA is the customer arrival rate in customers per second. The result that the mean inter- 
arrival time E[X] = 1/A seconds per customer then makes sense intuitively. 


The Expected Value of Y = g(X) 


Suppose that we are interested in finding the expected value of Y = g(X). As in the 
case of discrete random variables (Eq. (3.16)), E[Y] can be found directly in terms of 
the pdf of X: 


E|Y] = EOG dx. (4.30) 


To see how Eq. (4.30) comes about, suppose that we divide the y-axis into intervals 
of length h, we index the intervals with the index k and we let yọ be the value in the 
center of the kth interval. The expected value of Y is approximated by the follow- 
ing sum: 


E[Y] ~ Dyer (vi). 


Suppose that g(x) is strictly increasing, then the kth interval in the y-axis has a unique 
corresponding equivalent event of width h, in the x-axis as shown in Fig. 4.6. Let x; be 
the value in the Ath interval such that g(x) = yg, then since fy(y,)h = fx(xp)hk, 


E[Y] = D 8 (xa) fx (xa): 


By letting h approach zero, we obtain Eq. (4.30). This equation is valid even if g(x) is 
not strictly increasing. 
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A 


y= g(x) 


> X 


FIGURE 4.6 
Two infinitesimal equivalent events. 


Example 4.15 Expected Values of a Sinusoid with Random Phase 


Let Y = acos(wt + ©) where a, w, and t are constants, and © is a uniform random variable 
in the interval (0, 27m). The random variable Y results from sampling the amplitude of a sinu- 
soid with random phase ©. Find the expected value of Y and expected value of the power of 
YY: 


E[Y] = Eļ[acos(wt + ©)] 


2m 


ll 


Qa do 
| acos(wt + 6)—— = —asin(at + 0) 
0 2a 0 


= —asin(wt + 27) + asin(wt) = 0. 


The average power is 


E[Y?] = Ela? cos*(wt + ®)] = E ' © cos(2o + 20)| 


N 


a a dé a 


2a 
=—+ = 2wt + 0)— = —. 
5 a cos(2@ 8) 5 5 


Note that these answers are in agreement with the time averages of sinusoids: the time average 
(“dce” value) of the sinusoid is zero; the time-average power is a?/2. 
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Example 4.16 Expected Values of the Indicator Function 


Let g(X) = Ic(X) be the indicator function for the event {X in C}, where C is some interval or 
union of intervals in the real line: 


0 X notin C 
AAIE {° X inc, 
then 


EY] =f e(xifel) ax = f fel) dx = PIX inc). 


Thus the expected value of the indicator of an event is equal to the probability of the event. 


It is easy to show that Eqs. (3.17a)-(3.17e) hold for continuous random variables 
using Eq. (4.30). For example, let c be some constant, then 


E{c] = [efx dx = ef tx dx =c (4.31) 
and 
E[cX] = [exten dx = ef xfx) dx = cE| X]. (4.32) 


The expected value of a sum of functions of a random variable is equal to the sum 
of the expected values of the individual functions: 


k=1 
CO n n [oe 
= | Safa) dx = X | e)ra) dx 
n 
= > le(X)]. (4.33) 
k=1 
Example 4.17 
Let Y = 9(X) = a) + aX + aX? +--+ + a,X", where a, are constants, then 


E[Y] = Elag|] + Ela,X] +--+. + Ela,X"] 
= ay + G E[X] + aE, X?] + + a,E[X"], 
where we have used Eq. (4.33), and Eqs. (4.31) and (4.32). A special case of this result is that 
E[X +c] =E[X] +c, 


that is, we can shift the mean of a random variable by adding a constant to it. 
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Variance of X 


The variance of the random variable X is defined by 


VAR[X] = E[(X — E[X])?] = E[X?] - E[ X} (4.34) 
The standard deviation of the random variable X is defined by 
STD[X] = VAR[X]!?. (4.35) 


Example 4.18 Variance of Uniform Random Variable 


Find the variance of the random variable X that is uniformly distributed in the interval [a, b]. 
Since the mean of X is (a + b)/2, 


b 2 
1 at+b 
VAR[X] = T (: 5 ) dx. 
Let y = (x — (a + b)/2), 


(b—a)/2 — q\ 
1 (b — a) 

VAR[X] = 2 dy = -———_.. 
[2] b- Sl all . 12 


The random variables in Fig. 3.6 were uniformly distributed in the interval [—1, 1] and [3, 7], re- 
spectively. Their variances are then 1/3 and 4/3. The corresponding standard deviations are 0.577 
and 1.155. 


Example 4.19 Variance of Gaussian Random Variable 


Find the variance of a Gaussian random variable. 
First multiply the integral of the pdf of X by V 27 ø to obtain 


i Cm? dy = Vro. 
Differentiate both sides with respect to a: 
co (x a m} 
i (SS em}? dy = Vr. 
—0o oO 


By rearranging the above equation, we obtain 
1 j 220? 
VAR[X] = ii (x — mye OM dy = g. 
V 27 oJ- 
This result can also be obtained by direct integration. (See Problem 4.46.) Figure 4.7 shows the 
Gaussian pdf for several values of g; it is evident that the “width” of the pdf increases with o. 


The following properties were derived in Section 3.3: 


VAR{c] = 0 (4.36) 
VAR[X + c] = VARLX] (4.37) 
VAR[cX] = c? VAR[X], (4.38) 


where c is a constant. 
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m—A4 m—2 m m+2 m+4 


FIGURE 4.7 
Probability density function of Gaussian random variable. 


The mean and variance are the two most important parameters used in summa- 
rizing the pdf of a random variable. Other parameters are occasionally used. For ex- 
ample, the skewness defined by E[(X — E[X])°/STD[X]}> measures the degree of 
asymmetry about the mean. It is easy to show that if a pdf is symmetric about its 
mean, then its skewness is zero. The point to note with these parameters of the pdf is 
that each involves the expected value of a higher power of X. Indeed we show in a 
later section that, under certain conditions, a pdf is completely specified if the expect- 
ed values of all the powers of X are known. These expected values are called the mo- 
ments of X. 

The nth moment of the random variable X is defined by 


Co 


E[X"] = | dx. (4.39) 


The mean and variance can be seen to be defined in terms of the first two moments, 
E[ X] and E[X?]. 


*Example 4.20 Analog-to-Digital Conversion: A Detailed Example 


A quantizer is used to convert an analog signal (e.g., speech or audio) into digital form. A quan- 
tizer maps a random voltage X into the nearest point g(X) from a set of 2? representation values 
as shown in Fig. 4.8(a). The value X is then approximated by q(X), which is identified by an R-bit 
binary number. In this manner, an “analog” voltage X that can assume a continuum of values is 
converted into an R-bit number. 

The quantizer introduces an error Z = X — q(X) as shown in Fig. 4.8(b). Note that Z is a 
function of X and that it ranges in value between —d/2 and d/2, where d is the quantizer step size. 
Suppose that X has a uniform distribution in the interval [—2Xax, Xmax], that the quantizer has 2% 
levels, and that 2xmax = 2%d. It is easy to show that Z is uniformly distributed in the interval 
[—d/2, d/2] (see Problem 4.93). 
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(a) 


FIGURE 4.8 
(a) A uniform quantizer maps the input x into the closest point from the set {+d/2, +3d/2, +5d/2, +7d/2}. (b) The uniform 
quantizer error for the input x is x — q(x). 


Therefore from Example 4.12, 


d/2 — d/2 
Ez] =A=% =o 


The error Z thus has mean zero. 
By Example 4.18, 


(d/2 — (—d/2)) _@& 


VAR[Z] = a se 


This result is approximately correct for any pdf that is approximately flat over each quantizer in- 
terval. This is the case when 2° is large. 
The approximation q(x) can be viewed as a “noisy” version of X since 


OX) = X-Z, 


where Z is the quantization error Z. The measure of goodness of a quantizer is specified by the 
SNR ratio, which is defined as the ratio of the variance of the “signal” X to the variance of the 
distortion or “noise” Z: 

VAR[X] VAR[X] 


VAR[Z] d’/12 


where we have used the fact that d = 2xmax/2?. When X is nonuniform, the value xmax is select- 
ed so that P[|X| > xmax] is small. A typical choice is Xmax = 4 STD[X ]. The SNR is then 


SNR = —27", 


This important formula is often quoted in decibels: 


SNR dB = 10 logio SNR = 6R — 7.3 dB. 


4.4 


4.4.1 


4.4.2 
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The SNR increases by a factor of 4 (6 dB) with each additional bit used to represent X. This 
makes sense since each additional bit doubles the number of quantizer levels, which in turn re- 
duces the step size by a factor of 2. The variance of the error should then be reduced by the 
square of this, namely 2? = 4. 


IMPORTANT CONTINUOUS RANDOM VARIABLES 


We are always limited to measurements of finite precision, so in effect, every random 
variable found in practice is a discrete random variable. Nevertheless, there are several 
compelling reasons for using continuous random variable models. First, in general, con- 
tinuous random variables are easier to handle analytically. Second, the limiting form of 
many discrete random variables yields continuous random variables. Finally, there are 
a number of “families” of continuous random variables that can be used to model a 
wide variety of situations by adjusting a few parameters. In this section we continue 
our introduction of important random variables. Table 4.1 lists some of the more im- 
portant continuous random variables. 


The Uniform Random Variable 


The uniform random variable arises in situations where all values in an interval of the real 
line are equally likely to occur. The uniform random variable U in the interval [a, b] has pdf: 


1 
asxzb 
fu(x) =) b-a (4.40) 
0 x<a and x>b 
and cdf 

0 x<a 
F(x) = S a<x<b (4.41) 

1 x>b 


See Figure 4.2. The mean and variance of U are given by: 


— ay 
E(u] =" ’ P ad’ VARIXY] = i 


(4.42) 


The uniform random variable appears in many situations that involve equally 
likely continuous random variables. Obviously U can only be defined over intervals 
that are finite in length. We will see in Section 4.9 that the uniform random variable 
plays a crucial role in generating random variables in computer simulation models. 


The Exponential Random Variable 


The exponential random variable arises in the modeling of the time between occur- 
rence of events (e.g., the time between customer demands for call connections), and in 
the modeling of the lifetime of devices and systems. The exponential random variable 
X with parameter A has pdf 
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TABLE 4.1 Continuous random variables. 


Uniform Random Variable 


Sy = la, b] 
fx()= p= asxsb 
b-a 
a+b (b — a)? ej — ejwa 
E[X] = VAR[X] = —>— p = r 
[x] [x] = —> AOE e A 
Exponential Random Variable 
Sx = [0, œ) 
fx(x) = Ae x=0O and A>0 
Hee! VARS - aay 
à x2 A ig 
Remarks: The exponential random variable is the only continuous random variable with the memoryless 
property. 
Gaussian (Normal) Random Variable 
Sy = (0%, +00) 
e~ x—m) [20° 
J= -© <x <+%©0 and ao > 0 


Vro 
E[X] =m VAR[X] = o ®y() = eimo-0°w?/2 
Remarks: Under a wide range of conditions X can be used to approximate the sum of a large number of in- 


dependent random variables. 


Gamma Random Variable 


Sy = (0, +00) 
A(Ax)* le 
xx) = = x>0O and a>0,A>0 
Ta) 
where T(z) is the gamma function (Eq. 4.56). 
1 

E[X]=a/k  VAR[X] =a © =— 

[X] = a/ [X] = a/ x(@) (I jol 


Special Cases of Gamma Random Variable 
m-1 Erlang Random Variable: a = m, a positive integer 
de (Ax)? 1 m 
fx(x) = m- * >0 x(a) = (5) 
Remarks: An m-1 Erlang random variable is obtained by adding m independent exponentially distributed 
random variables with parameter A. 
Chi-Square Random Variable with k degrees of freedom: a = k/2, k a positive integer, and A = 1/2 


y(K72)/2 7*2 ( 1 j 
x) = ——__ x>0 Py(w) = 
OP TIA eal Geen 
Remarks: The sum of k mutually independent, squared zero-mean, unit-variance Gaussian random vari- 
ables is a chi-square random variable with k degrees of freedom. 


Laplacian Random Variable 


Sx = (7%, œ) 


fx(x) = ae -œ <x < +œ and a>0O 


E[X]=0  VAR[X]=2/} ®y(o) = 


Rayleigh Random Variable 


Sx = [0, œœ) 
fx(x) = Serre x20 and a>0 
a 


E[X] =aVr/2 VAR[X] = (2 — a/2)a? 


Cauchy Random Variable 


fx(x) az 7 -—o <x<+00 and a>0O 
x“ +a 


Mean and variance do not exist. xlo) = ealel 


Pareto Random Variable 


Sy = [Xm,©)Xp» > 0. 


0 bce oe 
fela) = ee 
xx aati x= Xm 
2 
_ AXm AX 
E[X] = fora>1  VAR[X] = ———"——  fora>2 
a-l (a — 2)(a — 1)? 


Remarks: The Pareto random variable is the most prominent example of random variables with “long 
tails,” and can be viewed as a continuous version of the Zipf discrete random variable. 


Beta Random Variable 
T 
paelo T eee and a> 0.8 >0 
= = 
0 otherwise 
E[X] = — VARLX] aR 
a + (a + B)(a + B +1) 


Remarks: The beta random variable is useful for modeling a variety of pdf shapes for random variables 
that range over finite intervals. 
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0 x<0O 
falx) a pe (4.43) 
and cdf 
0 x<0 
F. = 4.44 
x(x) f — e™ x = 0. (4:44) 


The cdf and pdf of X are shown in Fig. 4.9. 

The parameter A is the rate at which events occur, so in Eq. (4.44) the probability 
of an event occurring by time x increases at the rate A increases. Recall from Example 
3.31 that the interarrival times between events in a Poisson process (Fig. 3.10) is an ex- 
ponential random variable. 

The mean and variance of X are given by: 

E|U] = : d VAR|X] = : 
[U]=> an [x] = 5. 
In event interarrival situations, A is in units of events/second and 1/A is in units of sec- 
onds per event interarrival. 
The exponential random variable satisfies the memoryless property: 


P[X >t+h|X >t] = P[X > Al. (4.46) 


(4.45) 


The expression on the left side is the probability of having to wait at least h additional 
seconds given that one has already been waiting t seconds. The expression on the right 
side is the probability of waiting at least h seconds when one first begins to wait. Thus 
the probability of waiting at least an additional h seconds is the same regardless of how 
long one has already been waiting! We see later in the book that the memoryless prop- 
erty of the exponential random variable makes it the cornerstone for the theory of 


FQ) 4 


dex 


(a) (b) 


FIGURE 4.9 
An example of a continuous random variable—the exponential random variable. Part (a) is the cdf and part (b) is the pdf. 
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Markov chains, which is used extensively in evaluating the performance of computer 
systems and communications networks. 
We now prove the memoryless property: 


PULX >t +h}n{xX > th] 


PIX >t+h|X >t] = PIX > 1] frh > 0 
PIX >t+h) een) 
— PIX >t]  e™ 


= e™ = Pe >h]. 


It can be shown that the exponential random variable is the only continuous random 
variable that satisfies the memoryless property. 
Examples 2.13, 2.28, and 2.30 dealt with the exponential random variable. 


The Gaussian (Normal) Random Variable 


There are many situations in manmade and in natural phenomena where one deals with a 
random variable X that consists of the sum of a large number of “small” random variables. 
The exact description of the pdf of X in terms of the component random variables can be- 
come quite complex and unwieldy. However, one finds that under very general conditions, 
as the number of components becomes large, the cdf of X approaches that of the Gaussian 
(normal) random variable.' This random variable appears so often in problems involving 
randomness that it has come to be known as the “normal” random variable. 
The pdf for the Gaussian random variable X is given by 


1 
fx(x) = 
V 2710 
where m and ø > 0 are real numbers, which we showed in Examples 4.13 and 4.19 to be 
the mean and standard deviation of X. Figure 4.7 shows that the Gaussian pdf is a “bell- 
shaped” curve centered and symmetric about m and whose “width” increases with o. 
The cdf of the Gaussian random variable is given by 


Gmyee Lo < x < o, (4.47) 


1 © Gn 
PX = x] = em Re dx, (4.48) 
V 270 J: 
The change of variable t = (x’ — m)/o results in 
1 (x-m)/ao be 
Fy(x) = = e' dt 
V 277 J- 
= o(* =m) (4.49) 


where ®(x) is the cdf of a Gaussian random variable with m = 0 and ø = 1: 


(x) = aa / a dt. (4.50) 


1This result, called the central limit theorem, will be discussed in Chapter 7. 
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Therefore any probability involving an arbitrary Gaussian random variable can be ex- 
pressed in terms of ®(x). 


Example 4.21 


Show that the Gaussian pdf integrates to one. Consider the square of the integral of the pdf: 


1 a 2 1f j 
È / ex? ax = zS ex? ax f ev dy 
V 2ar J—œ —0o —oo 
= val I ele ty dy dy. 
T J—oo J —oo 


Let x = r cos 0 and y = rsin@ and carry out the change from Cartesian to polar coordinates, 


then we obtain: 
1 o0 2r co 
A a 
ral | er dr d0 = | re’ dr 
27 Jo Jo 0 


=) 
= [-e 


=1. 


In electrical engineering it is customary to work with the Q-function, which is de- 
fined by 


Q(x) = 1 — (x) (4.51) 
ee ee 
an J dt. (4.52) 


Q(x) is simply the probability of the “tail” of the pdf. The symmetry of the pdf im- 
plies that 


Q(0)=1/2 and Q(-x)= 1 - Q(x). (4.53) 


The integral in Eq. (4.50) does not have a closed-form expression. Traditionally 
the integrals have been evaluated by looking up tables that list Q(x) or by using ap- 
proximations that require numerical evaluation [Ross]. The following expression has 
been found to give good accuracy for Q(x) over the entire range 0 < x < ov: 


x) 2 1 1 ew 2 
Wx) È — a)x + ER Var ; ney 


where a = 1/7 and b = 27 [Gallager]. Table 4.2 shows Q(x) and the value given by the 
above approximation. In some problems, we are interested in finding the value of x for 
which Q(x) = 10™%. Table 4.3 gives these values for k = 1,..., 10. 

The Gaussian random variable plays a very important role in communication sys- 
tems, where transmission signals are corrupted by noise voltages resulting from the 
thermal motion of electrons. It can be shown from physical principles that these volt- 
ages will have a Gaussian pdf. 
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TABLE 4.2 Comparison of Q(x) and approximation given by Eq. (4.54). 


x Q(x) Approximation x Q(x) Approximation 
0 5.00E-01 5.00E-01 2.7 3.4TE-03 3.46E-03 
0.1 4.60E-01 4.58E-01 2.8  2.56E-03 2.55E-03 
0.2 4.21E-01 4.17E-01 2.9 1.87E-03 1.86E-03 
0.3 3.82E-01 3.78E-01 3.0 1.35E-03 1.35E-03 
0.4 3.45E-01 3.41E-01 3.1 9.68E-04 9.66E-04 
0.5 3.09E-01 3.05E-01 3.2 6.87E-04 6.86E-04 
0.6 2.74E-01 2.71E-01 3.3 4.83E-04 4.83E-04 
0.7 2.42E-01 2.39E-01 3.4  3.37E-04 3.36E-04 
0.8 2.12E-01 2.09E-01 3.5  2.33E-04 2.32E-04 
0.9 1.84E-01 1.82E-01 3.6 1.59E-04 1.59E-04 
1.0 1.59E-01 1.57E-01 3.7 1.08E-04 1.08E-04 
1.1 1.36E-01 1.34E-01 3.8  7.24E-05 7.23E-05 
1.2 1.15E-01 1.14E-01 3.9 4.81E-05 4.81E-05 
1.3  9.68E-02 9.60E-02 4.0 3.17E-05 3.16E-05 
1.4  8.08E-02 8.01E-02 4.5  3.40E-06 3.40E-06 
1.5  6.68E-02 6.63E-02 5.0 2.87E-07 2.87E-07 
1.6 5.48E-02 5.44E-02 5.5 1.90E-08 1.90E-08 
1.7 4.46E-02 4.43E-02 6.0  9.87E-10 9.86E-10 
18  3.59E-02 3.57E-02 65 4.02E-11 4.02E-11 
1.9 2.87E-02 2.86E-02 7.0 1.28E-12 1.28E-12 
2.0 2.28E-02 2.26E-02 7.5 3.19E-14 3.19E-14 
2.1 1.79E-02 1.78E-02 8.0  6.22E-16 6.22E-16 
2.2 1.39E-02 1.39E-02 8.5  9.48E-18 9. 48E-18 
2.3 1.07E-02 1.07E-02 9.0  1.13E-19 1.13E-19 
2.4 8.20E-03 8.17E-03 95 1.05E-21 1.05E-21 
2.5 6.21E-03 6.19E-03 10.0 7.62E-24 7.62E-24 
2.6  4.66E-03 4.65E-03 


Example 4.22 


A communication system accepts a positive voltage V as input and outputs a voltage 
Y = aV + N, where a = 10 and Nisa Gaussian random variable with parameters m = 0 and 
o = 2. Find the value of V that gives PLY < 0] = 10°. 

The probability P[Y < 0] is written in terms of N as follows: 


PLY <0] = PlaV + N <0] 
= P[N < -aV] = o(=24) = (2) = 10%. 


From Table 4.3 we see that the argument of the Q-function should be aV/a = 4.753. Thus 
V = (4.753)a/a = 950.6. 


170 


4.4.4 


Chapter 4 One Random Variable 


TABLE 4.3 Q(x) = 10% 


k x=@Q7'(10-%) 


1.2815 
2.3263 
3.0902 
3.7190 
4.2649 
4.7535 
5.1993 
5.6120 
5.9978 
6.3613 
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The Gamma Random Variable 


The gamma random variable is a versatile random variable that appears in many appli- 
cations. For example, it is used to model the time required to service customers in queue- 
ing systems, the lifetime of devices and systems in reliability studies, and the defect 
clustering behavior in VLSI chips. 
The pdf of the gamma random variable has two parameters, a > 0 and A > 0, 
and is given by 
NO te 
fx(x) = ——_ 0<x< œ, (4.55) 
D(a) 


where IT (z) is the gamma function, which is defined by the integral 


T(z) = i xTle*dxy z>0Q. (4.56) 
0 


The gamma function has the following properties: 


T(z + 1) = zF(z) for z > 0, and 
T(m + 1) = m! for m a nonnegative integer. 


The versatility of the gamma random variable is due to the richness of the gamma 
function T(z). The pdf of the gamma random variable can assume a variety of shapes 
as shown in Fig. 4.10. By varying the parameters « and A it is possible to fit the gamma 
pdf to many types of experimental data. In addition, many random variables are spe- 
cial cases of the gamma random variable. The exponential random variable is obtained 
by letting a = 1. By letting A = 1/2 and a = k/2, where k is a positive integer, we ob- 
tain the chi-square random variable, which appears in certain statistical problems. The 
m-Erlang random variable is obtained when a = m, a positive integer. The m-Erlang 
random variable is used in the system reliability models and in queueing systems mod- 
els. Both of these random variables are discussed in later examples. 
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FIGURE 4.10 
Probability density function of gamma random variable. 


Example 4.23 


Show that the pdf of a gamma random variable integrates to one. 
The integral of the pdf is 


Let y = Ax, then dx = dy/d and the integral becomes 


A“ . =l} 
==] yle? dy=1 
aE, po 


where we used the fact that the integral equals (a). 
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In general, the cdf of the gamma random variable does not have a closed-form 
expression. We will show that the special case of the m-Erlang random variable does 
have a closed-form expression for the cdf by using its close interrelation with the expo- 
nential and Poisson random variables. The cdf can also be obtained by integration of 


the pdf (see Problem 4.74). 


Consider once again the limiting procedure that was used to derive the Poisson 
random variable. Suppose that we observe the time S,, that elapses until the occur- 
rence of the mth event. The times X1, X>,..., Xm between events are exponential ran- 


dom variables, so we must have 


Sm = Xi + Xo +++ + Xp. 
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We will show that S,, is an m-Erlang random variable. To find the cdf of S,,,, let N(t) be 
the Poisson random variable for the number of events in t seconds. Note that the mth 
event occurs before time t—that is, S,, = t—if and only if m or more events occur in t 
seconds, namely Mt) = m. The reasoning goes as follows. If the mth event has oc- 
curred before time t, then it follows that m or more events will occur in time t. On the 
other hand, if m or more events occur in time f, then it follows that the mth event oc- 
curred by time ¢. Thus 


Fs,(t) = P[Sm = t] = P[N(t) = m] (4.57) 
ap Eh 
=1- > TE (4.58) 


where we have used the result of Example 3.31. If we take the derivative of the above 
cdf, we finally obtain the pdf of the m-Erlang random variable. Thus we have shown 
that S,,, is an m-Erlang random variable. 


Example 4.24 


A factory has two spares of a critical system component that has an average lifetime of 1/A = 1 
month. Find the probability that the three components (the operating one and the two spares) 
will last more than 6 months. Assume the component lifetimes are exponential random variables. 

The remaining lifetime of the component in service is an exponential random variable 
with rate A by the memoryless property. Thus, the total lifetime X of the three components is the 
sum of three exponential random variables with parameter A = 1. Thus X has a 3-Erlang distri- 
bution with A = 1. From Eq. (4.58) the probability that X is greater than 6 is 

P[X > 6) =1- P[X = 6] 
2 


6* 
= Dae = .06197. 
k05- 


The Beta Random Variable 
The beta random variable X assumes values over a closed interval and has pdf: 

f(x) Sex 1 = x)?! forO<x<1 (4.59) 
where the normalization constant is the reciprocal of the beta function 


1 
1 
— = B(a,b) = fxi E ae dx 
0 


Cc 


and where the beta function is related to the gamma function by the following expression: 
T(a)l(b 
Bia pa Oe, 
T(a + b) 


When a = b = 1, we have the uniform random variable. Other choices of a and b give 
pdfs over finite intervals that can differ markedly from the uniform. See Problem 4.75. If 


4.4.6 
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a = b > 1, then the pdf is symmetric about x = 1/2 and is concentrated about x = 1/2 

as well. When a = b < 1, then the pdf is symmetric but the density is concentrated at the 

edges of the interval. When a < b (or a > b) the pdf is skewed to the right (or left). 
The mean and variance are given by: 


a ab 
E[X] = and VAR[X] = Ga ba ss D 


RS (4.60) 


The versatility of the pdf of the beta random variable makes it useful to model a 
variety of behaviors for random variables that range over finite intervals. For example, 
in a Bernoulli trial experiment, the probability of success p could itself be a random 
variable. The beta pdf is frequently used to model p. 


The Cauchy Random Variable 
The Cauchy random variable X assumes values over the entire real line and has pdf: 
Va 
fx(x) = (4.61) 


© 1+ 


It is easy to verify that this pdf integrates to 1. However, X does not have any moments 
since the associated integrals do not converge. The Cauchy random variable arises as 
the tangent of a uniform random variable in the unit interval. 


The Pareto Random Variable 


The Pareto random variable arises in the study of the distribution of wealth where it 
has been found to model the tendency for a small portion of the population to own a 
large portion of the wealth. Recently the Pareto distribution has been found to cap- 
ture the behavior of many quantities of interest in the study of Internet behavior, 
e.g., sizes of files, packet delays, audio and video title preferences, session times in 
peer-to-peer networks, etc. The Pareto random variable can be viewed as a continuous 
version of the Zipf discrete random variable. 
The Pareto random variable X takes on values in the range x > xm, where xm 
is a positive real number. X has complementary cdf with shape parameter a > 0 
given by: 
1 X < Xm 
P[X > x] = 4 ¥m (4.62) 
x 


X= Xp. 


The tail of X decays algebraically with x which is rather slower in comparison to the ex- 
ponential and Gaussian random variables. The Pareto random variable is the most 
prominent example of random variables with “long tails.” 

The cdf and pdf of X are: 


0 XS Ky 
Fy(x) = Xm (4.63) 
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Because of its long tail, the cdf of X approaches 1 rather slowly as x increases. 


0 < 
Ree (4.64) 


Example 4.25 Mean and Variance of Pareto Random Variable 


Find the mean and variance of the Pareto random variable. 


5 Xh K Xe a Xr AXm 
EX] = f tat d= f a dt = Te fora >1 (4.65) 


XG oo ya a xa ax? 
E[X?] = Pa—dt = a— dt = = fora > 2 
tet! t 2 
Xm Xm 


where the second moment is defined for a > 2. 
The variance of X is then: 


ax?, ax, V ax, 
VAR[X] = = for a > 2. (4.66) 


FUNCTIONS OF A RANDOM VARIABLE 


Let X be a random variable and let g(x) be a real-valued function defined on the real 
line. Define Y = g(X), that is, Y is determined by evaluating the function g(x) at the 
value assumed by the random variable X.Then Y is also a random variable. The prob- 
abilities with which Y takes on various values depend on the function g(x) as well as 
the cumulative distribution function of X. In this section we consider the problem of 
finding the cdf and pdf of Y. 


Example 4.26 
Let the function h(x) = (x)* be defined as follows: 


Gyre 0 Le ae 
x ifx = 0. 


For example, let X be the number of active speakers in a group of N speakers, and let Y be the 
number of active speakers in excess of M, then Y = (X — M)*. In another example, let X be a 
voltage input to a halfwave rectifier, then Y = (X)* is the output. 
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Example 4.27 


Let the function g(x) be defined as shown in Fig. 4.8(a), where the set of points on the real line are 
mapped into the nearest representation point from the set Sy = {—3.5d, —2.5d, —1.5d, —0.5d, 
0.5d, 1.5d, 2.5d, 3.5d}. Thus, for example, all the points in the interval (0, d) are mapped into the 
point d/2. The function q(x) represents an eight-level uniform quantizer. 


Example 4.28 


Consider the linear function c(x) = ax + b, where a and b are constants. This function arises in 
many situations. For example, c(x) could be the cost associated with the quantity x, with the constant 
a being the cost per unit of x, and b being a fixed cost component. In a signal processing context, 
c(x) = ax could be the amplified version (if a > 1) or attenuated version (if a < 1) of the voltage x. 


The probability of an event C involving Y is equal to the probability of the equiv- 
alent event B of values of X such that g(X) is in C: 


P[Y inC] = P[g(X) in C] = P[X in B]. 


Three types of equivalent events are useful in determining the cdf and pdf of Y = g(X): 
(1) The event {g(X) = yg} is used to determine the magnitude of the jump at a point yy 
where the cdf of Y is known to have a discontinuity; (2) the event {g(X) = y} is used to 
find the cdf of Y directly; and (3) the event {y < g(X) = y + h} is useful in determining 
the pdf of Y. We will demonstrate the use of these three methods in a series of examples. 

The next two examples demonstrate how the pmf is computed in cases where 
Y = g(X) is discrete. In the first example, X is discrete. In the second example, X is 
continuous. 


Example 4.29 


Let X be the number of active speakers in a group of N independent speakers. Let p be the prob- 
ability that a speaker is active. In Example 2.39 it was shown that X has a binomial distribution 
with parameters N and p. Suppose that a voice transmission system can transmit up to M voice 
signals at a time, and that when X exceeds M, X — M randomly selected signals are discarded. 
Let Y be the number of signals discarded, then 


Y = (X - MÖ. 


Y takes on values from the set Sy = {0,1,...,N — M}. Y will equal zero whenever X is less 
than or equal to M, and Y will equal k > 0 when X is equal to M + k. Therefore 


PLY = 0] = P[X in {0,1,...,M}] = Sp; 
j=0 


and 


PIY = k] = P[X=M+k]= pux O<kSN-M, 


where p; is the pmf of X. 
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Example 4.30 


Let X be a sample voltage of a speech waveform, and suppose that X has a uniform distribution 
in the interval [—4d, 4d]. Let Y = q(X), where the quantizer input-output characteristic is as 
shown in Fig. 4.10. Find the pmf for Y. 

The event {Y = q} for q in Sy is equivalent to the event {X in J,}, where J, is an interval 
of points mapped into the representation point q. The pmf of Y is therefore found by evaluating 


PLY = 4] = f f(t) dt. 


It is easy to see that the representation point has an interval of length d mapped into it. Thus the 
eight possible outputs are equiprobable, that is, P[Y = q] = 1/8 for q in Sy. 


In Example 4.30, each constant section of the function q(X) produces a delta 
function in the pdf of Y. In general, if the function g(X) is constant during certain in- 
tervals and if the pdf of X is nonzero in these intervals, then the pdf of Y will contain 
delta functions. Y will then be either discrete or of mixed type. 

The cdf of Y is defined as the probability of the event {Y = y}. In principle, it 
can always be obtained by finding the probability of the equivalent event {g(X) = y} 
as shown in the next examples. 


Example 4.31 A Linear Function 
Let the random variable Y be defined by 
Y = aX +b, 


where a is a nonzero constant. Suppose that X has cdf Fy(x), then find Fy(y). 
The event {Y = y} occurs when A = {aX + b < y} occurs. If a > 0, then A = {X S 
(y — b)/a} (see Fig. 4.11), and thus 


Ro) = px ==] = (=) a0. 


On the other hand, if a < 0, then A = {X = (y — b)/a}, and 


y—b y—b 
Fy(y) = P| X = ; =1-Fy P a< 0. 


We can obtain the pdf of Y by differentiating with respect to y. To do this we need to use the 
chain rule for derivatives: 


dF _ dF du 
dy du dy’ 
where u is the argument of F. In this case, u = (y — b)/a, and we then obtain 


fr(y) = 1p,(2—) a>0 


a a 
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* xX 
FIGURE 4.11 
The equivalent event for {Y < y} is the event 
{X = (y — b)/a}, ifa > 0. 
and 
1 y—b 
fry) = Lsd ) a<0 
a a 
The above two results can be written compactly as 
1 y—b 
fr) = Erd } (4.67) 
lal a 


Example 4.32 A Linear Function of a Gaussian Random Variable 


Let X be a random variable with a Gaussian pdf with mean m and standard deviation ø: 
1 m)? 
fx(x) = e(e-myRo* Loo < x< o, (4.68) 


V2T o 


Let Y = aX + b, then find the pdf of Y. 
Substitution of Eq. (4.68) into Eq. (4.67) yields 


1 


= V 27lac| 


Note that Y also has a Gaussian distribution with mean b + am and standard deviation |al ø. 
Therefore a linear function of a Gaussian random variable is also a Gaussian random variable. 


ey b-am)"12(ao)” 


fry) 


Example 4.33 
Let the random variable Y be defined by 
Y =X, 


where X is a continuous random variable. Find the cdf and pdf of Y. 
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FIGURE 4.12 
The equivalent event for {Y = y} is the event 
{-Vy =X = Vy}. ify = 0. 


The event {Y = y} occurs when {X° = y} or equivalently when {-Vy = X = Vy} 
for y nonnegative; see Fig. 4.12. The event is null when y is negative. Thus 


F JO y<0 
H= rv V y0 


and differentiating with respect to y, 


OFV) fl V9) 


_ tx(V9) . fx(-VY) 
et NF (4.69) 


Example 4.34 A Chi-Square Random Variable 


Let X be a Gaussian random variable with mean m = 0 and standard deviation ø = 1. X is then 
said to be a standard normal random variable. Let Y = X?. Find the pdf of Y. 
Substitution of Eq. (4.68) into Eq. (4.69) yields 


—y/2 
fely) = — y=0. (4.70) 


V 2ym 


From Table 4.1 we see that fy(y) is the pdf of a chi-square random variable with one degree of 
freedom. 


The result in Example 4.33 suggests that if the equation yọ = g(x) has n solu- 
tions, xo, X1,---, Xn, then fy(yo) will be equal to n terms of the type on the right-hand 
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y= gt 


y+dy 
y 


xı xı + dx, x + dx) X2 X3 X3 + dx; 


FIGURE 4.13 
The equivalent event of {y < Y < y + dy} is {x; < X < x, + dx} 
U {x2 + dx < X < x} U {x3 < X < x + dx}. 


side of Eq. (4.69). We now show that this is generally true by using a method for direct- 
ly obtaining the pdf of Y in terms of the pdf of X. 

Consider a nonlinear function Y = g( X) such as the one shown in Fig. 4.13. Con- 
sider the event C, = {y < Y < y + dy} and let B, be its equivalent event. For y indi- 
cated in the figure, the equation g(x) = y has three solutions x1, x2, and x3, and the 
equivalent event B, has a segment corresponding to each solution: 


By = {x1 < X < xı + dx} U {x2 + dx, < X < x2} 
U {x3 < X < x3 + dx3}. 
The probability of the event C, is approximately 


P[C,] = fy(y)ldy], (4.71) 


where |dy| is the length of the interval y < Y = y + dy. Similarly, the probability of 
the event B, is approximately 


P[B,] = fx(xi)ldxy] + fx(x2)ldx2| + fx(x3)ldx3l. (4.72) 


Since C, and B, are equivalent events, their probabilities must be equal. By equating 
Eqs. (4.71) and (4.72) we obtain 


_ fx(x) 
fy(y) aa D Tiya = (4.73) 
= Zf) Z (4.74) 
k i dy X=Xk l 


It is clear that if the equation g(x) = y has n solutions, the expression for the pdf of Y 
at that point is given by Eqs. (4.73) and (4.74), and contains n terms. 
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Example 4.35 


Let Y = X? as in Example 4.34. For y = 0, the equation y = x? has two solutions, xy) = Vy and 
x; = — Vy, so Eq. (4.73) has two terms. Since dy/dx = 2x, Eq. (4.73) yields 


_ fc(V9) fx V) 


fr) 


2Vy 2Vy 
This result is in agreement with Eq. (4.69). To use Eq. (4.74), we note that 
dx _d ie 1 
dy dy 2Vy 


which when substituted into Eq. (4.74) then yields Eq. (4.69) again. 


Example 4.36 Amplitude Samples of a Sinusoidal Waveform 


Let Y = cos( X), where X is uniformly distributed in the interval (0, 277]. Y can be viewed as the 
sample of a sinusoidal waveform at a random instant of time that is uniformly distributed over 
the period of the sinusoid. Find the pdf of Y. 

It can be seen in Fig. 4.14 that for —1 < y < 1 the equation y = cos(x) has two solutions in 
the interval of interest, x) = cos '(y) and x, = 27r — xo. Since (see an introductory calculus 
textbook) 


—| = —sin(x9) = —sin(cos(y)) = —V1— y’, 


dx | x, 


and since fy(x) = 1/27 in the interval of interest, Eq. (4.73) yields 


1 1 
fr(y) = H 
2rV1- y 2mV1-y 
1 


= for-1<y<1l. 


0.5 


t 
2m —cos™ly 2a 


i 
cos=1(9) 


=0,5 = 


=1 


FIGURE 4.14 
y = cos x has two roots in the interval (0, 27). 
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The cdf of Y is found by integrating the above: 


Fy(y) 


= Nje © 
= 
A 
Se 
A 
m 


Y is said to have the arcsine distribution. 


THE MARKOV AND CHEBYSHEV INEQUALITIES 


In general, the mean and variance of a random variable do not provide enough infor- 
mation to determine the cdf/pdf. However, the mean and variance of a random vari- 
able X do allow us to obtain bounds for probabilities of the form P[|X| = t]. Suppose 
first that X is a nonnegative random variable with mean E [X]. The Markov inequality 
then states that 
E[X] 
P[X =a] = a for X nonnegative. (4.75) 


We obtain Eq. (4.75) as follows: 


EIX] = f ift a+ f tfx(t) a= f tfx(t) dt 


= I afx(t) dt = aP| X = a]. 


The first inequality results from discarding the integral from zero to a; the second in- 
equality results from replacing t with the smaller number a. 


Example 4.37 


The mean height of children in a kindergarten class is 3 feet, 6 inches. Find the bound on the prob- 
ability that a kid in the class is taller than 9 feet. The Markov inequality gives P[H = 9] = 42/108 
= 389. 


The bound in the above example appears to be ridiculous. However, a bound, by 
its very nature, must take the worst case into consideration. One can easily construct a 
random variable for which the bound given by the Markov inequality is exact. The rea- 
son we know that the bound in the above example is ridiculous is that we have knowl- 
edge about the variability of the children’s height about their mean. 

Now suppose that the mean E[X] = m and the variance VAR[X] = ø? of a 
random variable are known, and that we are interested in bounding P[|X — m| = a]. 
The Chebyshev inequality states that 


N 


PIX -m| = a] = & 


>: (4.76) 
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The Chebyshev inequality is a consequence of the Markov inequality. Let D? = (X 
— m)’ be the squared deviation from the mean. Then the Markov inequality applied to 
D? gives 


E[(X -m’]_ œ 


PID? = &?) =< = —, 
[ ] Z 2 


Equation (4.76) follows when we note that { D? = a°} and {|X — m| = a} are equiv- 
alent events. 

Suppose that a random variable X has zero variance; then the Chebyshev in- 
equality implies that 


P[X = m] = 1, (4.77) 


that is, the random variable is equal to its mean with probability one. In other words, X 
is equal to the constant m in almost all experiments. 


Example 4.38 


The mean response time and the standard deviation in a multi-user computer system are known 
to be 15 seconds and 3 seconds, respectively. Estimate the probability that the response time is 
more than 5 seconds from the mean. 

The Chebyshev inequality with m = 15 seconds, o = 3 seconds, and a = 5 seconds gives 


9 
P(X — 15| = 5] = = = .36. 
(Ix - 15] = 5] == 


Example 4.39 


If X has mean m and variance o”, then the Chebyshev inequality for a = kø gives 


1 
P[|X -m| = ko] =< rm 
Now suppose that we know that X is a Gaussian random variable, then for k = 2, P[|X — m| = 2c] 
= .0456, whereas the Chebyshev inequality gives the upper bound .25. 


Example 4.40 Chebyshev Bound Is Tight 


Let the random variable X have P[ X v] = P[X = v] = 0.5. The mean is zero and the vari- 
ance is VAR[ X] = E[X?] = (~v)? 0.5 + v7 0.5 = v’. 
Note that P[|X| = v] = 1. The Chebyshev inequality states: 


VAR[X] 


Pi |X| = v] s1- — 7 ~ =1 
v 


We see that the bound and the exact value are in agreement, so the bound is tight. 
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We see from Example 4.38 that for certain random variables, the Chebyshev in- 
equality can give rather loose bounds. Nevertheless, the inequality is useful in situations 
in which we have no knowledge about the distribution of a given random variable other 
than its mean and variance. In Section 7.2, we will use the Chebyshev inequality to prove 
that the arithmetic average of independent measurements of the same random variable 
is highly likely to be close to the expected value of the random variable when the num- 
ber of measurements is large. Problems 4.100 and 4.101 give examples of this result. 

If more information is available than just the mean and variance, then it is possi- 
ble to obtain bounds that are tighter than the Markov and Chebyshev inequalities. 
Consider the Markov inequality again. The region of interest is A = {t = a}, so let 
I,(t) be the indicator function, that is, Z4(t) = 1 if te A and Z4(t) = 0 otherwise. The 
key step in the derivation is to note that t/a = 1 in the region of interest. In effect we 
bounded 74(t) by t/a as shown in Fig. 4.15. We then have: 


Px =a] f Laofa(e) at = f f(t) dt = a 


a 


By changing the upper bound on 74(t), we can obtain different bounds on P| X = a]. 
Consider the bound J,(t) = e79, also shown in Fig. 4.15, where s > 0. The resulting 
bound is: 


Pix =a = | Linfeaes f Ofk a 


= at e"fy(t) dt - e “El e’*), (4.78) 
0 


This bound is called the Chernoff bound, which can be seen to depend on the expected 
value of an exponential function of X. This function is called the moment generating 
function and is related to the transforms that are introduced in the next section. We de- 
velop the Chernoff bound further in the next section. 


est- a) 


FIGURE 4.15 
Bounds on indicator function for A = {t = a}. 
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TRANSFORM METHODS 


In the old days, before calculators and computers, it was very handy to have loga- 
rithm tables around if your work involved performing a large number of multiplica- 
tions. If you wanted to multiply the numbers x and y, you looked up log(x) and 
log(y), added log(x) and log(y), and then looked up the inverse logarithm of the 
result. You probably remember from grade school that longhand multiplication is 
more tedious and error-prone than addition. Thus logarithms were very useful as a 
computational aid. 

Transform methods are extremely useful computational aids in the solution of 
equations that involve derivatives and integrals of functions. In many of these problems, 
the solution is given by the convolution of two functions: f(x) * fo(x). We will define 
the convolution operation later. For now, all you need to know is that finding the con- 
volution of two functions can be more tedious and error-prone than longhand multipli- 
cation! In this section we introduce transforms that map the function f(x) into another 
function ¥;,(w), and that satisfy the property that F [f,(x) « fo(x)] = ¥1(@)F2(@). In 
other words, the transform of the convolution is equal to the product of the individual 
transforms. Therefore transforms allow us to replace the convolution operation by 
the much simpler multiplication operation. The transform expressions introduced in 
this section will prove very useful when we consider sums of random variables in 
Chapter 7. 


The Characteristic Function 


The characteristic function of a random variable X is defined by 


®y(w) = Elel?*] (4.79a) 
= f_i dx, (4.79b) 


where j = \/-1 is the imaginary unit number. The two expressions on the right-hand 
side motivate two interpretations of the characteristic function. In the first expression, 
® x(w) can be viewed as the expected value of a function of X, e/°*, in which the para- 
meter w is left unspecified. In the second expression, ® x(w) is simply the Fourier 
transform of the pdf fy(x) (with a reversal in the sign of the exponent). Both of these 
interpretations prove useful in different contexts. 

If we view ® y(w) as a Fourier transform, then we have from the Fourier trans- 
form inversion formula that the pdf of X is given by 


fx(x) = Ea a _Pxlo)e dw. (4.80) 


It then follows that every pdf and its characteristic function form a unique Fourier 
transform pair. Table 4.1 gives the characteristic function of some continuous random 
variables. 
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Example 4.41 Exponential Random Variable 


The characteristic function for an exponentially distributed random variable with parameter A is 
given by 


co CO 
yw) -{ de *el* dx = I de AI) dx 


_ À 
À — jo’ 


If X is a discrete random variable, substitution of Eq. (4.20) into the definition of 
P y(w) gives 


Dy(@) = X px(xp)e "r discrete random variables. 
k 


Most of the time we deal with discrete random variables that are integer-valued. The 
characteristic function is then 


oO 


Dy(w) = X, px(k)e®* integer-valued random variables. (4.81) 
k=—00 
Equation (4.81) is the Fourier transform of the sequence py(k). Note that the 
Fourier transform in Eq. (4.81) is a periodic function of w with period 27, since 
ellot2mk= ejekejk?T and e/?7™ = 1, Therefore the characteristic function of integer- 
valued random variables is a periodic function of w. The following inversion formula 
allows us to recover the probabilities p(k) from ® y(@): 


1 2r : 
px(k) I ylw dw k= 0, +1, +2,... (4.82) 
0 


T On 
Indeed, a comparison of Eqs. (4.81) and (4.82) shows that the py(k) are simply the co- 


efficients of the Fourier series of the periodic function ® y(w). 


Example 4.42 Geometric Random Variable 


The characteristic function for a geometric random variable is given by 


® y(w) = 2 pae = p>, (qel”)* 


a a 
1- gel?” 


Since fy(x) and ® y(w) form a transform pair, we would expect to be able to ob- 
tain the moments of X from ® y(w). The moment theorem states that the moments of 
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X are given by 


E[X") = +“ (0) 


ye daw" 


(4.83) 


w=0 


To show this, first expand e/”* in a power series in the definition of ® y(w): 


oX)? 
x(w) = [roh + jaX + Gex) X = dx. 


Assuming that all the moments of X are finite and that the series can be integrated 
term by term, we obtain 


LAS 2 -n n 
pert prne e =x lipp Se an ly 


If we differentiate the above expression once and evaluate the result at œ = 0 we obtain 


d 


x(@) 
w=0 


If we differentiate n times and evaluate at w = 0, we finally obtain 


= j"E[X"), 


which yields Eq. (4.83). 
Note that when the above power series converges, the characteristic function and 
hence the pdf by Eq. (4.80) are completely determined by the moments of X. 


Example 4.43 
To find the mean of an exponentially distributed random variable, we differentiate ®y(w) 
= \(A — jw) once, and obtain 
(0) = 
ee = joy” 


The moment theorem then implies that E[X] = ®4,(0)/j = WA. 
If we take two derivatives, we obtain 
—2A 
DO) aes 
(A =- jw)? 


so the second moment is then E[ X*] = ®%(0)/j? = 2/A”. The variance of X is then given by 


VAR[X] = E[X?] - E[ XP = ye 


4.7.2 
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Example 4.44 Chernoff Bound for Gaussian Random Variable 


Let X be a Gaussian random variable with mean m and variance o°. Find the Chernoff bound 
for X. 
The Chernoff bound (Eq. 4.78) depends on the moment generating function: 


E[e*] = ®x(-js). 
In terms of the characteristic function the bound is given by: 
P[X = a] se“@y(-js) for s=0. 


The parameter s can be selected to minimize the upper bound. 
The bound for the Gaussian random variable is: 


a 2.2 ae ee 2.2 
P[LX > a] <e sagms+o°s'/2 — e s(a—-m)+0*s*/2 for s=0O. 


We minimize the upper bound by minimizing the exponent: 


d 5 
0 = “(—s(a — m) + o?s?/2) which implies s = — 7. 
ds o 


The resulting upper bound is: 


P[X =a]l= o( = z) < e-(a-my20*, 


oO 


This bound is much better than the Chebyshev bound and is similar to the estimate given in 
Eq. (4.54). 


The Probability Generating Function 


In problems where random variables are nonnegative, it is usually more convenient to 
use the z-transform or the Laplace transform. The probability generating function 
Gy(z) of a nonnegative integer-valued random variable N is defined by 


Gy(z) = E[z™] (4.84a) 
= 2 pr(k)z“. (4.84b) 


The first expression is the expected value of the function of N, z™. The second expres- 
sion is the z-transform of the pmf (with a sign change in the exponent). Table 3.1 shows 
the probability generating function for some discrete random variables. Note that the 
characteristic function of N is given by ®y(w) = Gy(e!”). 

Using a derivation similar to that used in the moment theorem, it is easy to show 
that the pmf of N is given by 


1 dé 
= Kl age On) ade (4.85) 


This is why Ga(z) is called the probability generating function. By taking the first two 
derivatives of Ga(z) and evaluating the result at z = 1, it is possible to find the first 


Py(k) 
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two moments of X: 


d DO 7 (0,0) 
Ge on®)| = Dpnlk)kz | = $ kpy(k) = E[N] 
z z=1  ķk=0 z=1  ķK=0 
and 
d? Sa 
Tan) = Spn(k)k(k — 1)zk? 
Z z=1  k=0 z=1 


= Sk(k — 1)py(k) = ELN(N = 1)] = E[N?] ENI. 


Thus the mean and variance of X are given by 
E[N] = Gy(1) (4.86) 
and 
VAR[N] = G}(1) + Gy(1) — (G1) Y. (4.87) 


Example 4.45 Poisson Random Variable 


The probability generating function for the Poisson random variable with parameter a is given by 


© ak (az) 
Gy(z) = e azk =e? 
PTa o k! 
= ee = e271) 


The first two derivatives of Gy (z) are given by 
G'y(z) = ae) 


and 
Gi(z) = are), 


Therefore the mean and variance of the Poisson are 
E[N] =a 
VAR[N] = +a- e =a. 


4.7.3. The Laplace Transform of the pdf 


In queueing theory one deals with service times, waiting times, and delays. All of these 
are nonnegative continuous random variables. It is therefore customary to work with 
the Laplace transform of the pdf, 


X*(s) = | f(x) dx = Ele, (4.88) 


Note that X*(s) can be interpreted as a Laplace transform of the pdf or as an expected 
value of a function of X, e™*. 


4.8 
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The moment theorem also holds for X*(s): 


(4.89) 


Example 4.46 Gamma Random Variable 


The Laplace transform of the gamma pdf is given by 


© ya y.a-1,-Ax 4-5 a ie 
X*(s) = | ie a = ae za z! xate Ots) dx 
0 0 


T(a) T(a@ 
Ne 1 prem Ne 
= ee a dy = —— 
T(a) (A + aah SOR aye 


where we used the change of variable y = (A + s)x. We can then obtain the first two moments 
of X as follows: 


d AC A“ 
E[X] = =| =—— == 
ds (A+ s)*|s-0 (A +s)**1]s=0 A 
and 
a Ae a(a + 1)A* a(a+1 
mee ie ( ) a i ) 
ds“ (A+ 8)"|s-0 (A + s)®*? |s=0 A 


Thus the variance of X is 


BASIC RELIABILITY CALCULATIONS 


In this section we apply some of the tools developed so far to the calculation of 
measures that are of interest in assessing the reliability of systems. We also show 
how the reliability of a system can be determined in terms of the reliability of its 
components. 


The Failure Rate Function 


Let T be the lifetime of a component, a subsystem, or a system. The reliability at time t 
is defined as the probability that the component, subsystem, or system is still function- 
ing at time t: 


R(t) = P[T > t). (4.90) 


The relative frequency interpretation implies that, in a large number of components or 
systems, R(t) is the fraction that fail after time t. The reliability can be expressed in 
terms of the cdf of T: 


R(t) =1- P[T st] =1—- F(t). (4.91) 
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Note that the derivative of R(t) gives the negative of the pdf of T: 


R'(t) = —fr(t). (4.92) 
The mean time to failure (MTTF) is given by the expected value of T: 


EIT] = f fds [Roar 


where the second expression was obtained using Eqs. (4.28) and (4.91). 
Suppose that we know a system is still functioning at time t; what is its future be- 
havior? In Example 4.10, we found that the conditional cdf of T given that T > t is 


given by 
Fr(x|T >t) = P[T < x|T >t] 
0 x<t 
= 4 Fr(x) — Fr(t) (4.93) 
a N Sil 
1 — Fr(t) 
The pdf associated with Fr(x|T > t) is 
fr(x) 
> = = t. g 
fr(x|T > t) 1- FW x2=t (4.94) 


Note that the denominator of Eq. (4.94) is equal to R(¢). 
The failure rate function r(t) is defined as f;(x|T > t) evaluated at x = t: 


r(t) = fr(tlT > t) 


yen 4.95 
since by Eq. (4.92), R'(t) = —fr(t). The failure rate function has the following meaning: 
Pt <T st + dt|T >t] = fr(t|T > t) dt = r(t) dt. (4.96) 


In words, r(t) dt is the probability that a component that has functioned up to time ¢ will 
fail in the next dt seconds. 


Example 4.47 Exponential Failure Law 


Suppose a component has a constant failure rate function, say r(t) = A. Find the pdf and the 
MTTF for its lifetime T. 
Equation (4.95) implies that 
R' (t) 
R(t) 


=); (4.97) 


Equation (4.97) is a first-order differential equation with initial condition R(0) = 1. If we 
integrate both sides of Eq. (4.97) from 0 to t, we obtain 


T d k [aoe InR 
= Adt'+k= t = In R(t), 
0 o R(t’) ©) 
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which implies that 
R(t) = Ke™, — where K = e. 
The initial condition R(0) = 1 implies that K = 1. Thus 
R(th=e“ t>0 (4.98) 


and 
frt) = ae“ t>0. 
Thus if T has a constant failure rate function, then T is an exponential random variable. This is 


not surprising, since the exponential random variable satisfies the memoryless property. The 
MTTF = E[T] = 1/A. 


The derivation that was used in Example 4.47 can be used to show that, in gener- 
al, the failure rate function and the reliability are related by 


R(t) = apf- fr) ar) (4.99) 


fr(t) = r(t) apf- fr ar}. (4.100) 


and from Eq. (4.92), 


Figure 4.16 shows the failure rate function for a typical system. Initially there may 
be a high failure rate due to defective parts or installation. After the “bugs” have been 
worked out, the system is stable and has a low failure rate. At some later point, ageing 
and wear effects set in, resulting in an increased failure rate. Equations (4.99) and 
(4.100) allow us to postulate reliability functions and the associated pdf’s in terms of 
the failure rate function, as shown in the following example. 


r£) 


FIGURE 4.16 
Failure rate function for a typical system. 


192 


4.8.2 


Chapter 4 One Random Variable 


Example 4.48 Weibull Failure Law 
The Weibull failure law has failure rate function given by 
r(t) = aßt?™!, (4.101) 
where a and £ are positive constants. Equation (4.99) implies that the reliability is given by 
R(t) = e, 
Equation (4.100) then implies that the pdf for T is 
fr(t) = apt? eÀ t >00. (4.102) 


Figure 4.17 shows fr(t) for a = 1 and several values of 6. Note that B = 1 yields the expo- 
nential failure law, which has a constant failure rate. For B > 1, Eq. (4.101) gives a failure rate 
function that increases with time. For B < 1, Eq. (4.101) gives a failure rate function that de- 
creases with time. Further properties of the Weibull random variable are developed in the 
problems. 


Reliability of Systems 


Suppose that a system consists of several components or subsystems. We now show 
how the reliability of a system can be computed in terms of the reliability of its subsys- 
tems if the components are assumed to fail independently of each other. 


fr 


FIGURE 4.17 
Probability density function of Weibull random variable, œ = 1 and 
B= 1,2,4. 
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(a) System consisting of n components in series. (b) System consisting 
of n components in parallel. 


Consider first a system that consists of the series arrangement of n components 
as shown in Fig. 4.18(a). This system is considered to be functioning only if all the com- 
ponents are functioning. Let A, be the event “system functioning at time ¢,” and let A; 
be the event “jth component is functioning at time t,” then the probability that the sys- 
tem is functioning at time t is 


R(t) = P[A,] 
= PIAN AN ++» NA,] = P[A]P[ A]... P[Ay] 
= R\(t)R,(t)...R,(t), (4.103) 


since P[A;] = R;(t), the reliability function of the jth component. Since probabilities 
are numbers that are less than or equal to one, we see that R (t) can be no more reliable 
than the least reliable of the components, that is, R(t) = min; R,(¢). 

If we apply Eq. (4.99) to each of the Rj(t) in Eq. (4.103), we then find that the fail- 
ure rate function of a series system is given by the sum of the component failure rate 
functions: 


R(t) = exp{—fpni(t’) (t’) dt'} exp{— fon(t’) dt'} . .-exp{—fotn(t’) dt'} 
= exp{—Jfo[n(t’) + E) ++ r(t) at’}. 


Example 4.49 


Suppose that a system consists of n components in series and that the component lifetimes are 
exponential random variables with rates A,, A2,..., Àn. Find the system reliability. 
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From Eqs. (4.98) and (4.103), we have 
R(t) = ee ent 
H(A HANE 


= é 


Thus the system reliability is exponentially distributed with rate Ay + Ap + +++ + Ay. 


Now suppose that a system consists of n components in parallel, as shown in 
Fig. 4.18(b). This system is considered to be functioning as long as at least one of the 
components is functioning. The system will not be functioning if and only if all the 
components have failed, that is, 


P[AS] = P[A{|P[A3].-. P[A3]. 


Thus 

1— R(t) = (1 = Ri(t)) = Ro(t)).-- (1 = R,(t)), 
and finally, 

R(t) =1-(1- R(t))(1 — R(t))...(1 — R,(t)). (4.104) 
Example 4.50 


Compare the reliability of a single-unit system against that of a system that operates two units in 
parallel. Assume all units have exponentially distributed lifetimes with rate 1. 
The reliability of the single-unit system is 


R,(t) =e". 
The reliability of the two-unit system is 


Rie) == ee") 


= "(2 — e”). 
The parallel system is more reliable by a factor of 


(2- e")>1. 


More complex configurations can be obtained by combining subsystems consisting 
of series and parallel components. The reliability of such systems can then be computed in 
terms of the subsystem reliabilities. See Example 2.35 for an example of such a calculation. 


COMPUTER METHODS FOR GENERATING RANDOM VARIABLES 


The computer simulation of any random phenomenon involves the generation of ran- 
dom variables with prescribed distributions. For example, the simulation of a queueing 
system involves generating the time between customer arrivals as well as the service 
times of each customer. Once the cdf’s that model these random quantities have been 
selected, an algorithm for generating random variables with these cdf’s must be found. 
MATLAB and Octave have built-in functions for generating random variables for all 
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of the well known distributions. In this section we present the methods that are used 
for generating random variables. All of these methods are based on the availability of 
random numbers that are uniformly distributed between zero and one. Methods for 
generating these numbers were discussed in Section 2.7. 

All of the methods for generating random variables require the evaluation of ei- 
ther the pdf, the cdf, or the inverse of the cdf of the random variable of interest. We can 
write programs to perform these evaluations, or we can use the functions available in 
programs such as MATLAB and Octave. The following example shows some typical 
evaluations for the Gaussian random variable. 


Example 4.51 Evaluation of pdf, cdf, and Inverse cdf 


Let X be a Gaussian random variable with mean 1 and variance 2. Find the pdf at x = 7. Find the 
cdf at x = —2. Find the value of x at which the cdf = 0.25. 
The following commands show how these results are obtained using Octave. 


>normal_pdf (7, 1, 2) 
ans = 3.4813e-05 
>normal_cdf (—2, 1, 2) 
ans = 0.016947 
>normal_inv (0.25, 1, 2) 
ans = 0.046127 


The Transformation Method 


Suppose that U is uniformly distributed in the interval [0, 1]. Let Fy(x) be the cdf of 
the random variable we are interested in generating. Define the random variable, 
Z = F%(U); that is, first U is selected and then Z is found as indicated in Fig. 4.19. The 
cdf of Z is 


P[Z = x] = P[Fy¥W) = x] = PIU = Fy(x)]. 


But if U is uniformly distributed in [0, 1] and 0 = h = 1, then P[U = h] = h (see 
Example 4.6). Thus 
PIZ = x] = Fy(2), 


and Z = F¥(U) has the desired cdf. 


Transformation Method for Generating X: 


1. Generate U uniformly distributed in [0, 1]. 
2. Let Z = F% (U). 


Example 4.52 Exponential Random Variable 


To generate an exponentially distributed random variable X with parameter A, we need to invert 
the expression u = Fy(x) = 1 — e™. We obtain 


1 
an — U). 
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Fy(x) 


: Z=F U) 


FIGURE 4.19 
Transformation method for generating a random variable with cdf Fy(x). 


Note that we can use the simpler expression X = —In(U)/A, since 1 — U is also uniform- 
ly distributed in [0, 1]. The first two lines of the Octave commands below show how to implement 
the transformation method to generate 1000 exponential random variables with A = 1. Figure 
4.20 shows the histogram of values obtained. In addition, the figure shows the probability that 
samples of the random variables fall in the corresponding histogram bins. Good correspondence 
between the histograms and these probabilities are observed. In Chapter 8 we introduce meth- 
ods for assessing the goodness-of-fit of data to a given distribution. Both MATLAB and Octave 
use the transformation method in their function exponential_rnd. 


U=rand (1, 1000); % Generate 1000 uniform random variables. 
X=-log(U) ; % Compute 1000 exponential RVs. 
K=0.25:0.5:6; 

P(1)=1-exp(-0.5) 

for i=2:12, % The remaining lines show how to generate 
P(i)=P(i-1) *exp(-0.5) % the histogram bins. 

end; 

stem (K, P) 

hold on 

Hist (X, K, 1) 


VeVVVVV VO VOVOV 


The Rejection Method 


We first consider the simple version of this algorithm and explain why it works; then 
we present it in its general form. Suppose that we are interested in generating a ran- 
dom variable Z with pdf fy(x) as shown in Fig. 4.21. In particular, we assume that: (1) 
the pdf is nonzero only in the interval [0, a], and (2) the pdf takes on values in the 
range [0, b]. The rejection method in this case works as follows: 
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0.45 


FIGURE 4.20 
Histogram of 1000 exponential random variables using transformation method. 


FIGURE 4.21 
Rejection method for generating a random variable with pdf f(x). 


1. Generate X; uniform in the interval [0, a]. 
2. Generate Y uniform in the interval [0, b]. 
3. If Y = fy(X), then output Z = X4; else, reject X; and return to step 1. 
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Note that this algorithm will perform a random number of steps before it produces the 
output Z. 

We now show that the output Z has the desired pdf. Steps 1 and 2 select a point at 
random in a rectangle of width a and height b. The probability of selecting a point in 
any region is simply the area of the region divided by the total area of the rectangle, ab. 
Thus the probability of accepting X; is the probability of the region below f(x) divid- 
ed by ab. But the area under any pdf is 1, so we conclude that the probability of success 
(i.e., acceptance) is 1/ab. Consider now the following probability: 


Pix < X, < x + dx|X; is accepted] 
— P[{x < X, = x + dx} N {X accepted} ] 
P| X; accepted] 
shaded area/ab fx(x) dx/ab 


1/ab 1/ab 
= fx(x) dx. 
Therefore X, when accepted has the desired pdf. Thus Z has the desired pdf. 


Example 4.53 Generating Beta Random Variables 


Show that the beta random variables with a’ = b' = 2 can be generated using the rejection method. 
The pdf of the beta random variable with a’ = b’ = 2 is similar to that shown in Fig. 4.21. 
This beta pdf is maximum at x = 1/2 and the maximum value is: 
(1/2)? (1/2)?! 1/4 14 3 


B(2,2) TOPO I3 2 


Therefore we can generate this beta random variable using the rejection method with b = 1.5. 


The algorithm as stated above can have two problems. First, if the rectangle does 
not fit snugly around fy(x), the number of X,’s that need to be generated before ac- 
ceptance may be excessive. Second, the above method cannot be used if fy(x) is un- 
bounded or if its range is not finite. The general version of this algorithm overcomes 
both problems. Suppose we want to generate Z with pdf f(x). Let W be a random 
variable with pdf f(x) that is easy to generate and such that for some constant K > 1, 


Kfw(x) = fx(x) for all x, 


that is, the region under K fy(x) contains fy(x) as shown in Fig. 4.22. 


Rejection Method for Generating X: 


1. Generate X; with pdf f(x). Define B(X,) = Kfw(X1). 
2. Generate Y uniform in [0, B(X;) }. 
3. If Y < fx(X;,), then output Z = X; else reject X; and return to step 1. 


See Problem 4.143 for a proof that Z has the desired pdf. 
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FIGURE 4.22 


Rejection method for generating a random variable with gamma pdf and with 


0=@¢= 1, 


Example 4.54 Gamma Random Variable 
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We now show how the rejection method can be used to generate X with gamma pdf and parameters 


0 <a< 1andA = 1. A function K fy(x) that “covers” f(x) is easily obtained (see Fig. 4.22): 


xe! a i 


fx(x) = Tla) = Kfy(x) = 


xe! 
0sxs1 
T(a) x 
e* 
>1 
Tia) 7 


The pdf fw(x) that corresponds to the function on the right-hand side is 


aex*! 
ate 
f(x) E =% 
e 
ae 
ate 
The cdf of W is 
ex“ 
ate 
Fy(x) = -x 
1 — ae 


ate 


0 


<x<1 


W is easy to generate using the transformation method, with 


(œa + e)u ye 
Fy(u) = | i | (1 — u) 


nc + e) 


ae 


| 
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We can therefore use the transformation method to generate this fy(x), and then the rejec- 
tion method to generate any gamma random variable X with parameters 0 < a < 1 and 
A = 1. Finally we note that if we let W = AX, then W will be gamma with parameters a and 
A. The generation of gamma random variables with a > 1 is discussed in Problem 4.142. 


Example 4.55 Implementing Rejection Method for Gamma Random Variables 


Given below is an Octave function definition to implement the rejection method using the above 
transformation. 


% Generate random numbers from the gamma distribution for 0 = a = 1. 
function X = gamma_rejection_method_altone (alpha) 
while (true), 


X= special_inverse (alpha) ; % Step 1: Generate X with pdf fy(x). 
B= special_pdf (X, alpha); % Step 2: Generate Y uniform in [0, Kfy(X)]. 
Y=rand.* B; 
if (Y <= fx_gamma_pdf (X, alpha)), % Step 3: Accept or reject... 
break; 
end 
end 


% Helper function to generate random variables according to K fz(x). 
function X = special_inverse (alpha) 
u = rand; 
if (u<=e./(alphate)), 
X= ((alpha+e).*u./e). ^ (1./alpha) ; 
elseif (u>e./(alphate)), 
X= -log((alphate) .*(1-u) ./ (alpha. *e)); 
end 


% Return B in order to generate uniform variables in [0, Kfz(X)]. 
function B = special_pdf (X, alpha) 
if (X >=0 & X<=1), 
B=alpha.*e.*X.*(alpha-1) ./ (alpha +e); 
elseif (X>1), 
B=alpha.*e.*(e. *(-X)./(alpha+e)); 
end 


% pdf of the gamma distribution. 

% Could also use the built in gamma_pdf (X, A, B) function supplied with Octave 
settingB=1 

function Y = fx_gamma_pdf (x, alpha) 

y= (x.^ (alpha-1)).*(e.* (-x)) ./ (gamma (alpha) ) ; 


Figure 4.23 shows the histogram of 1000 samples obtained using this function. The figure 
also shows the probability that the samples fall in the bins of the histogram. 


We have presented the most common methods that are used to generate ran- 
dom variables. These methods are incorporated in the functions provided by programs 
such as MATLAB and Octave, so in practice you do not need to write programs to 
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FIGURE 4.23 
1000 samples of gamma random variable using rejection method. 


generate the most common random variables. You simply need to invoke the appro- 
priate functions. 


Example 4.56 Generating Gamma Random Variables 


Use Octave to obtain eight Gamma random variables with a = 0.25 and à = 1. 
The Octave command and the corresponding answer are given below: 


> gamma_rnd (0.25, 1, 1, 8) 

ans = 

Columns 1 through 6: 
0.00021529 0.09331491 0.24606757 0.08665787 
0.00013400 0.23384718 

Columns 7 and 8: 
1.72940941 1.29599702 


4.9.3 Generation of Functions of a Random Variable 


Once we have a simple method of generating a random variable X, we can easily gener- 
ate any random variable that is defined by Y = g(X) or even Z = h( X1, Xo,..., Xn) 
where X,,..., X, are n outputs of the random variable generator. 


202 


4.9.4 


*4,10 


4.10.1 


Chapter 4 One Random Variable 


Example 4.57 m-Erlang Random Variable 


Let X,, X2,... be independent, exponentially distributed random variables with parameter A. 
In Chapter 7 we show that the random variable 
Y =X +X ++ X, 


has an m-Erlang pdf with parameter A. We can therefore generate an m-Erlang random variable 
by first generating m exponentially distributed random variables using the transformation 
method, and then taking the sum. Since the m-Erlang random variable is a special case of the 
gamma random variable, for large m it may be preferable to use the rejection method described 
in Problem 4.142. 


Generating Mixtures of Random Variables 


We have seen in previous sections that sometimes a random variable consists of a mix- 
ture of several random variables. In other words, the generation of the random variable 
can be viewed as first selecting a random variable type according to some pmf, and 
then generating a random variable from the selected pdf type. This procedure can be 
simulated easily. 


Example 4.58 Hyperexponential Random Variable 
A two-stage hyperexponential random variable has pdf 

f(x) = pae™ + (1 — p)be. 
It is clear from the above expression that X consists of a mixture of two exponential random 
variables with parameters a and b, respectively. X can be generated by first performing a 
Bernoulli trial with probability of success p. If the outcome is a success, we then use the transfor- 


mation method to generate an exponential random variable with parameter a. If the outcome is 
a failure, we generate an exponential random variable with parameter b instead. 


ENTROPY 


Entropy is a measure of the uncertainty in a random experiment. In this section, we 
first introduce the notion of the entropy of a random variable and develop several of 
its fundamental properties. We then show that entropy quantifies uncertainty by the 
amount of information required to specify the outcome of a random experiment. Fi- 
nally, we discuss the method of maximum entropy, which has found wide use in charac- 
terizing random variables when only some parameters, such as the mean or variance, 
are known. 


The Entropy of a Random Variable 


Let X be a discrete random variable with Sy = {1,2,...,K} and pmf p; = P[X = k]. 
We are interested in quantifying the uncertainty of the event A, = {X = k}. Clearly, the 
uncertainty of A, is low if the probability of A, is close to one, and it is high if the 
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probability of A, is small. The following measure of uncertainty satisfies these two 
properties: 


I(X =k)=In = -ln P[X = k]. (4.105) 


P[X =k] 


Note from Fig. 4.24 that I(X = k) = Oif P[X = k] = 1,and/(X = k) increases with 
decreasing P| X = k]. The entropy of a random variable X is defined as the expected 
value of the uncertainty of its outcomes: 


K 1 
Hy = E[{I(X)] = DPX = k] nry =K] 
= -SPx = k] ln P[X = k]. (4.106) 
k=1 


Note that in the above definition we have used J (X) as a function of a random variable. We 
say that entropy is in units of “bits” when the logarithm is base 2. In the above expression 
we are using the natural logarithm, so we say the units are in “nats.” Changing the base of 
the logarithm is equivalent to multiplying entropy by a constant, since ln(x) = In 2 log, x. 


Example 4.59 Entropy of a Binary Random Variable 


Suppose that Sy = {0,1} and p = P[X = 0] = 1 — P[X = 1]. Figure 4.25 shows —p In(p), 
—(1 — p)In(1 — p), and the entropy of the binary random variable Hy = h(p) = -p 
In(p) — (1 — p)In(1 — p) as functions of p. Note that A (p) is symmetric about p = 1/2 and that 
it achieves its maximum at p = 1/2. Note also how the uncertainties of the events {X = 0} and 
{X = 1} vary together in complementary fashion: When P[X = 0] is very small (i-e., highly 
uncertain), then P[X = 1] is close to one (i.e., highly certain), and vice versa. Thus the highest 
average uncertainty occurs when P| X = 0] = P[X = 1] = 1/2. 

Hy can be viewed as the average uncertainty that is resolved by observing X. This suggests 
that if we are designing a binary experiment (for example, a yes/no question), then the average un- 
certainty that is resolved will be maximized if the two outcomes are designed to be equiprobable. 


FIGURE 4.24 
In(1/x) = 1-x 
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FIGURE 4.25 
Entropy of binary random variable. 


Example 4.60 Reduction of Entropy Through Partial Information 


The binary representation of the random variable X takes on values from the set {000, 001, 
010,..., 111} with equal probabilities. Find the reduction in the entropy of X given the event 
A = {X begins with a 1}. 

The entropy of X is 


1 1 1 1 1 1 ; 
Hy g 0823 g 0223 e g 828 3 bits. 


The event A implies that X is in the set {100, 101, 110, 111}, so the entropy of X given A is 


1 1 1 1 F 
Ay, = 40824 ee 40824 = 2 bits. 


Thus the reduction in entropy is Hy — Hy|4 = 3 — 2 = 1 bit. 


Let p = (Pi, P2,---, Px), and q = (qi, @,---, qg) be two pmf’s. The relative en- 
tropy of q with respect to p is defined by 


1 < Pk 
H(p;q) = X, p; ın— — Hy = Pk In—. (4.107) 


The relative entropy is nonnegative, and equal to zero if and only if pp = qx for all k: 
H(p;q) = 0 with equality iff Pk = qk fork = 1,..., K. (4.108) 


We will use this fact repeatedly in the remainder of this section. 
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To show that the relative entropy is nonnegative, we use the inequality 
In(1/x) = 1 — x with equality iff x = 1, as shown in Fig. 4.24. Equation (4.107) then 
becomes 


K Pr K dk K K 
H(p34) = DS pp ln = Spd 1 = $p- Sa =0. (4.109) 
k=1 dk k=1 Pk k=1 k=1 


In order for equality to hold in the above expression, we must have pp = qx for 
k =1,...,K. 

Let X be any random variable with Sy = {1,2,..., K} and pmf p. If we let 
dx = 1/K in Eq. (4.108), then 


K 
H(p:q) = nK - Hy = X pn = 0, 
A&P TIK 


which implies that for any random variable X with Sy = {1,2,..., K}, 


Hy = InK with equality iff Pk = + k =1,...,K. (4.110) 
Thus the maximum entropy attainable by the random variable X is \n K, and this maxi- 
mum is attained when all the outcomes are equiprobable. 

Equation (4.110) shows that the entropy of random variables with finite Sy is al- 
ways finite. On the other hand, it also shows that as the size of Sy is increased, the en- 
tropy can increase without bound. The following example shows that some countably 
infinite random variables have finite entropy. 


Example 4.61 Entropy of a Geometric Random Variable 


The entropy of the geometric random variable with Sy = {0,1,2,...} is: 


co 


Hy = PC — p)“ In(p(1 — p)*) 


= -In p — In(1 - P) > kpl =p)" 


(1 — p)ln(1 — p) 
P 


= apap == pin =p). hp) (4.111) 
p P 


In p 


where A (p) is the entropy of a binary random variable. Note that Hy = 2 bits when p = 1/2. 


For continuous random variables we have that P| X = x] = 0 for all x. Therefore 
by Eq. (4.105) the uncertainty for every event {X = x} is infinite, and it follows from 
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Eq. (4.106) that the entropy of continuous random variables is infinite. The next exam- 
ple takes a look at how the notion of entropy may be applied to continuous random 
variables. 


Example 4.62 Entropy of a Quantized Continuous Random Variable 


Let X be a continuous random variable that takes on values in the interval [a, b]. Suppose that 
the interval [a, b] is divided into a large number K of subintervals of length A. Let Q (X) be the 
midpoint of the subinterval that contains X. Find the entropy of Q. 
Let x, be the midpoint of the kth subinterval, then P[Q = x,] = P[X is in kth subinterval] 
= Pix, — A/2 < X < x, + A/2] = fx(x;,)A, and thus 


K 
Hog = 2 Pg = x] In P[Q = x4] 
K 
7 ~ Dhx(xe)A In(fx(x%)A) 
K 
= —In(A) - 2 fx (x) lIn(fx(x;))A. (4.112) 


The above equation shows that there is a tradeoff between the entropy of Q and the quantiza- 
tion error X — Q(X). As A is decreased the error decreases, but the entropy increases with- 
out bound, once again confirming the fact that the entropy of continuous random variables is 
infinite. 


In the final expression for Hy in Eq. (4.112), as A approaches zero, the first ex- 
pression approaches infinity, but the second expression approaches an integral which 
may be finite in some cases. The differential entropy is defined by this integral: 


Hy = — | ful) In fx) dx = -Eli f(X)] (4.113) 


In the above expression, we reuse the term Hy with the understanding that we deal 
with differential entropy when dealing with continuous random variables. 


Example 4.63 Differential Entropy of a Uniform Random Variable 


The differential entropy for X uniform in [a, b] is 


Hy = efi 1 ) |= no a). (4.114) 


4.10.2 
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Example 4.64 Differential Entropy of a Gaussian Random Variable 
The differential entropy for X, a Gaussian random variable (see Eq. 4.47), is 


Hy = —E(In fx(X)] 


= In(2mre0?). (4.115) 


The entropy function and the differential entropy function differ in several funda- 
mental ways. In the next section we will see that the entropy of a random variable has a 
very well defined operational interpretation as the average number of information bits re- 
quired to specify the value of the random variable. Differential entropy does not possess 
this operational interpretation. In addition, the entropy function does not change when 
the random variable X is mapped into Y by an invertible transformation. Again, the dif- 
ferential entropy does not possess this property. (See Problems 4.153 and 4.160.) Never- 
theless, the differential entropy does possess some useful properties. The differential 
entropy appears naturally in problems involving entropy reduction, as demonstrated in 
Problem 4.159. In addition, the relative entropy of continuous random variables, which is 
defined by 

fx(*) 


fy(x) 


tae J POL 


does not change under invertible transformations. 


Entropy as a Measure of Information 


Let X be a discrete random variable with Sy = {1,2,..., K} and pmf p, = P[X = k]. 
Suppose that the experiment that produces X is performed by John, and that he at- 
tempts to communicate the outcome to Mary by answering a series of yes/no questions. 
We are interested in characterizing the minimum average number of questions required 
to identify X. 


Example 4.65 


An urn contains 16 balls: 4 balls are labeled “1”, 4 are labeled “2”, 2 are labeled “3”, 2 are labeled 
“4” and the remaining balls are labeled “5”, “6”, “7”, and “8.” John picks a ball from the urn at 
random, and he notes the number. Discuss what strategies Mary can use to find out the number 
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of the ball through a series of yes/no questions. Compare the average number of questions asked 
to the entropy of X. 

If we let X be the random variable denoting the number of the ball, then Sy = {1,2,..., 8} 
and the pmf is p = (1/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/16). We will compare the two strategies 
shown in Figs. 4.26(a) and (b). 

The series of questions in Fig. 4.26(a) uses the fact that the probability of {X = k} de- 
creases with k. Thus it is reasonable to ask the question {“Was X equal to 1?”}, {“Was X equal to 
2?”}, and so on, until the answer is yes. Let L be the number of questions asked until the answer 
is yes, then the average number of questions asked is 


ELL] = 1(3) + 2(4) + 3(§) + 4(3) + 5(ze) + 6(&) + 7(&) + 75) 
= 51/16. 


The series of questions in Fig. 4.26(b) uses the observation made in Example 4.57 that 
yes/no questions should be designed so that the two answers are equiprobable. The questions in 


FIGURE 4.26 
Two strategies for finding out the value of X through a series of yes/no questions. 
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Fig. 4.26(b) meet this requirement. The average number of questions asked is 


E[L] = 2(3) H 2(3) i 3(§) 3(§) a(i) ' a(s) ' a(i) ' 4( i) 
= 44/16. 


Thus the second series of questions has the better performance. 
Finally, we find that the entropy of X is 


H li 1 1 1 1 1 
X 42824 40824 g 0828 


ere = 44/16, 


16 16 


which is equal to the performance of the second series of questions. 


The problem of designing the series of questions to identify the random variable 
X is exactly the same as the problem of encoding the output of an information source. 
Each output of an information source is a random variable X, and the task of the en- 
coder is to map each possible output into a unique string of binary digits. We can see 
this correspondence by taking the trees in Fig. 4.26 and identifying each yes/no answer 
with a 0/1. The sequence of 0’s and 1’s from the top node to each terminal node then 
defines the binary string (“codeword”) for each outcome. It then follows that the prob- 
lem of finding the best series of yes/no questions is the same as finding the binary tree 
code that minimizes the average codeword length. 

In the remainder of this section we develop the following fundamental results 
from information theory. First, the average codeword length of any code cannot be less 
than the entropy. Second, if the pmf of X consists of powers of 1/2, then there is a tree 
code that achieves the entropy. And finally, by encoding groups of outcomes of X we 
can achieve average codeword length arbitrarily close to the entropy. Thus the entropy 
of X represents the minimum average number of bits required to establish the outcome 
of X. 

First, let’s show that the average codeword length of any tree code cannot be less 
than the entropy. Note from Fig. 4.26 that the set of lengths {/,} of the codewords for 
every complete binary tree must satisfy 


K 
ye, (4.116) 


To see this, extend the tree to the same depth as the longest codeword, as shown in Fig. 4.27. 
If we then “prune” the tree at a node of depth /,, we remove a fraction 2™ of the nodes at 
the bottom of the tree. Note that the converse result is also true: If a set of codeword 
lengths satisfies Eq. (4.116), then we can construct a tree code with these lengths. 

Consider next the difference between the entropy and E[L] for any binary 
tree code: 


K K 
E[L] - Hy = DPX PÈN [X = k] log, P[X = k] 
= $ Pix = k] e 7 g (4.117) 
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FIGURE 4.27 


Extension of a binary tree code to a full tree. 


where we have expressed the entropy in bits. Equation (4.17) is the relative entropy of 
Eq. (4.107) with q; = 2™. Thus by Eq. (4.108) 


E[L] = Hy with equality iff P(X = k] = 2*. (4.118) 


Thus the average number of questions for any tree code (and in particular the best tree 
code) cannot be less than the entropy of X. Therefore we can use the entropy Hy as a 
baseline against which to test any code. 

Equation (4.118) also implies that if the outcomes of X all have probabilities that 
are integer powers of 1/2 (as in Example 4.63), then we can find a tree code that 
achieves the entropy. If P[ X = k] = 2™, then we assign the outcome k a binary code- 
word of length /,. We can show that we can always find a tree code with these lengths 
by using the fact that the probabilities add to one, and hence the codeword lengths sat- 
isfy Eq. (4.116). Equation (4.118) then implies that E[ L] = H. 

It is clear that Eq. (4.117) will be nonzero if the p;,’s are not integer powers of 1/2. 
Thus in general the best tree code does not always have E[L] = Hy. However, it is 
possible to show that the approach of grouping outcomes into sets that are approxi- 
mately equiprobable leads to tree codes with lengths that are close to the entropy. Fur- 
thermore, by encoding vectors of outcomes of X, it is possible to obtain average 
codeword lengths that are arbitrarily close to the entropy. Problem 4.165 discusses how 
this is done. 

We have now reached our objective of showing that the entropy of a random 
variable X represents the minimum average number of bits required to identify its 
value. Before proceeding, let’s reconsider continuous random variables. A continuous 
random variable can assume values from an uncountably infinite set, so in general an 
infinite number of bits is required to specify its value. Thus, the interpretation of en- 
tropy as the average number of bits required to specify a random variable immediate- 
ly implies that continuous random variables have infinite entropy. This implies that any 
representation of a continuous random variable that uses a finite number of bits will 
inherently involve some approximation error. 
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4.10.3 The Method of Maximum Entropy 


Let X be a random variable with Sy = {x,,x2,...,xx} and unknown pmf p; = 
P[|X = xx]. Suppose that we are asked to estimate the pmf of X given the expected 
value of some function g(X) of X: 


K 
2 Slee) PLX = x] ies (4.119) 


For example, if g(X) = X then c = E[g(X)] = E[X], and if g(X) = (X — E[ X]? 
then c = VAR[X’]. Clearly, this problem is underdetermined since knowledge of these 
parameters is not sufficient to specify the pmf uniquely. The method of maximum en- 
tropy approaches this problem by seeking the pmf that maximizes the entropy subject 
to the constraint in Eq. (4.119). 

Suppose we set up this maximization problem by using Lagrange multipliers: 


PLX = Xx] 
Ceo 8x) 


K K 
Ay + (3px = xk]g(xk) — c) = ERA = x] ln , (4.120) 


where C = e°. Note that if {Ce’(**)} forms a pmf, then the above expression is the 
negative value of the relative entropy of this pmf with respect to p. Equation (4.108) 
then implies that the expression in Eq. (4.120) is always less than or equal to zero with 
equality iff P[X = x,] = Ce*8°”), We now show that this does indeed lead to the 
maximum entropy solution. 

Suppose that the random variable X has pmf pp = Ce*8(*"), where C and À are 


chosen so that Eq. (4.119) is satisfied and so that { p} is a pmf. X then has entropy 
Hy = E{-In P[X]] = [-InCe*8®)] = -InC + AE[g(X)] 


= -InC + àc. (4.121) 


Now let’s compare the entropy in Eq. (4.121) to that of some other pmf qx that also 
satisfies the constraint in Eq. (4.119). Consider the relative entropy of p with re- 
spect to q: 


K K K 
q 
0 = H(qp)= Xan = Yana + Xaq(-lnC + Ag(xx)) 
k=1 Pk k=1 k=1 


= -nC + Ac — H(q) = Hy — H(q). (4.122) 


Thus Hy = H(q), and p achieves the highest entropy. 


Example 4.66 


Let X be a random variable with Sy = {0,1,...} and expected value E[ X] = m. Find the pmf 
of X that maximizes the entropy. 
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In this example g(X) = X, so 
Py = Ce™ = Cat, 


where œ = e*. Clearly, X is a geometric random variable with mean m = a/(1 — @) and thus 
a = m/(m + 1). It then follows that C = 1 — a = 1/(m + 1). 


When dealing with continuous random variables, the method of maximum en- 
tropy maximizes the differential entropy: 


a fx(x) In fx(x) dx. (4.123) 
The parameter information is in the form 
c= EON = f 8f) dx (4.124) 


The relative entropy expression in Eq. (4.115) and the approach used for discrete ran- 
dom variables can be used to show that the pdf fy(x) that maximizes the differential 
entropy will have the form 


f(x) = Cee. (4.125) 


where C and à must be chosen so that Eq. (4.125) integrates to one and so that Eq. (4.124) 
is satisfied. 
Example 4.67 


Suppose that the continuous random variable X has known variance o? = E[(X — m)*], where 
the mean m is not specified. Find the pdf that maximizes the entropy of X. 
Equation (4.125) implies that the pdf has the form 


F(x) = Comer, 


We can meet the constraint in Eq. (4.124) by picking 


We thus obtain a Gaussian pdf with variance o°. Note that the mean m is arbitrary; that is, any 
choice of m yields a pdf that maximizes the differential entropy. 


The method of maximum entropy can be extended to the case where several pa- 
rameters of the random variable X are known. It can also be extended to the case of 
vectors and sequences of random variables. 
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The cumulative distribution function Fy(x) is the probability that X falls in the 
interval (—co, x]. The probability of any event consisting of the union of inter- 
vals can be expressed in terms of the cdf. 


A random variable is continuous if its cdf can be written as the integral of a non- 
negative function. A random variable is mixed if it is a mixture of a discrete and a 
continuous random variable. 


The probability of events involving a continuous random variable X can be ex- 
pressed as integrals of the probability density function fy(x). 


If X is a random variable, then Y = g(X) is also a random variable. The notion of 
equivalent events allows us to derive expressions for the cdf and pdf of Y in terms 
of the cdf and pdf of X. 


The cdf and pdf of the random variable X are sufficient to compute all probabili- 
ties involving X alone. The mean, variance, and moments of a random variable 
summarize some of the information about the random variable X. These parame- 
ters are useful in practice because they are easier to measure and estimate than 
the cdf and pdf. 


Conditional cdf’s or pdf’s incorporate partial knowledge about the outcome of an 
experiment in the calculation of probabilities of events. 


The Markov and Chebyshev inequalities allow us to bound probabilities involv- 
ing X in terms of its first two moments only. 


Transforms provide an alternative but equivalent representation of the pmf and 
pdf. In certain types of problems it is preferable to work with the transforms 
rather than the pmf or pdf. The moments of a random variable can be obtained 
from the corresponding transform. 


The reliability of a system is the probability that it is still functioning after t hours 
of operation. The reliability of a system can be determined from the reliability of 
its subsystems. 


There are a number of methods for generating random variables with prescribed 
pmf’s or pdf’s in terms of a random variable that is uniformly distributed in the 
unit interval. These methods include the transformation and the rejection meth- 
ods as well as methods that simulate random experiments (e.g., functions of ran- 
dom variables) and mixtures of random variables. 


The entropy of a random variable X is a measure of the uncertainty of X in terms 
of the average amount of information required to identify its value. 


The maximum entropy method is a procedure for estimating the pmf or pdf of a 
random variable when only partial information about X, in the form of expected 
values of functions of X, is available. 
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Characteristic function 
Chebyshev inequality 
Chernoff bound 

Conditional cdf, pdf 
Continuous random variable 
Cumulative distribution function 
Differential entropy 

Discrete random variable 
Entropy 

Equivalent event 

Expected value of X 

Failure rate function 

Function of a random variable 


Maximum entropy method 
Mean time to failure (MTTF) 
Moment theorem 

nth moment of X 

Probability density function 
Probability generating function 
Probability mass function 
Random variable 

Random variable of mixed type 
Rejection method 

Reliability 

Standard deviation of X 
Transformation method 


Laplace transform of the pdf 


Variance of X 


Markov inequality 


ANNOTATED REFERENCES 


Reference [1] is the standard reference for electrical engineers for the material on ran- 
dom variables. Reference [2] is entirely devoted to continuous distributions. Reference 
[3] discusses some of the finer points regarding the concept of a random variable at a 
level accessible to students of this course. Reference [4] presents detailed discussions 
of the various methods for generating random numbers with specified distributions. 
Reference [5] also discusses the generation of random variables. Reference [9] is fo- 
cused on signal processing. Reference [11] discusses entropy in the context of informa- 
tion theory. 


1. 


2. 


3. 
4. 


A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes, 
McGraw-Hill, New York, 2002. 

N. Johnson et al., Continuous Univariate Distributions, vol. 2, Wiley, New York, 
1995. 

K. L. Chung, Elementary Probability Theory, Springer-Verlag, New York, 1974. 
A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill, 
New York, 2000. 


. S. M. Ross, Introduction to Probability Models, Academic Press, New York, 2003. 
. H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 


Princeton, N.J., 1946. 


. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, National Bu- 


reau of Standards, Washington, D.C., 1964. Downloadable: www.math.sfu.ca/~cbm 
/aands/. 


. R. C. Cheng, “The Generation of Gamma Variables with Nonintegral Shape Pa- 


rameter,” Appl. Statist., 26: 71-75, 1977. 


. R. Gray and L.D. Davisson, An Introduction to Statistical Signal Processing, 


Cambridge Univ. Press, Cambridge, UK, 2005. 


10. 


11. 


PROBLEMS 


Problems 215 


P. O. Börjesson and C. E. W. Sundberg, “Simple Approximations of the Error 
Function Q(x) for Communications Applications,” JEEE Trans. on Communica- 
tions, March 1979, 639-643. 

R. G. Gallager, Information Theory and Reliable Communication, Wiley, New 
York, 1968. 


Section 4.1: The Cumulative Distribution Function 


4.1. 


4.2. 


4.3. 


4.4. 


4.5. 


4.6. 


4.7. 


An information source produces binary pairs that we designate as Sy = {1, 2, 3,4} with 
the following pmf’s: 


(i) pp = pi/k for all k in Sy. 
Gi) pes) = p,/2 fork = 2,3, 4. 
Gii) pea, = py/2* for k = 2,3, 4. 
(a) Plot the cdf of these three random variables. 
(b) Use the cdf to find the probability of the events: {X = 1},{X < 2.5}, 
{05< X =2},{1< xX < 4}. 
A die is tossed. Let X be the number of full pairs of dots in the face showing up, and Y be the 
number of full or partial pairs of dots in the face showing up. Find and plot the cdf of X and Y. 
The loose minute hand of a clock is spun hard. The coordinates (x, y) of the point where 
the tip of the hand comes to rest is noted. Z is defined as the sgn function of the product 
of x and y, where sgn(f) is 1 if t > 0, 0ift = 0, and —1ift < 0. 
(a) Find and plot the cdf of the random variable X. 
(b) Does the cdf change if the clock hand has a propensity to stop at 3,6,9, and 12 o’clock? 


An urn contains 8 $1 bills and two $5 bills. Let X be the total amount that results when 

two bills are drawn from the urn without replacement, and let Y be the total amount that 

results when two bills are drawn from the urn with replacement. 

(a) Plot and compare the cdf’s of the random variables. 

(b) Use the cdf to compare the probabilities of the following events in the two prob- 
lems: {X = $2}, {X < $7}, {X = 6}. 

Let Y be the difference between the number of heads and the number of tails in the 3 

tosses of a fair coin. 

(a) Plot the cdf of the random variable Y. 

(b) Express P[|Y| < y] in terms of the cdf of Y. 


A dart is equally likely to land at any point inside a circular target of radius 2. Let R be 
the distance of the landing point from the origin. 


(a) Find the sample space S and the sample space of R, Sp. 

(b) Show the mapping from S to Sp. 

(c) The “bulls eye” is the central disk in the target of radius 0.25. Find the event A in Sg 
corresponding to “dart hits the bull’s eye.” Find the equivalent event in S and P[A]. 

(d) Find and plot the cdf of R. 

A point is selected at random inside a square defined by {(x, y):0 =x = b,0 = y = b}. 

Assume the point is equally likely to fall anywhere in the square. Let the random variable 

Z be given by the minimum of the two coordinates of the point where the dart lands. 

(a) Find the sample space S and the sample space of Z, Sz. 
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4.8. 


4.9. 


4.10. 


4.11. 


4.12. 


4.13. 


4.14. 


4.15. 
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(b) Show the mapping from S to Sz. 

(c) Find the region in the square corresponding to the event {Z = z}. 

(d) Find and plot the cdf of Z. 

(e) Use the cdf to find: P[Z > 0], P[Z > b], P[Z = b/2], P[Z > bl4]. 

Let ¢ be a point selected at random from the unit interval. Consider the random variable 
X= (1 = cy, 

(a) Sketch X as a function of ¢. 

(b) Find and plot the cdf of X. 

(c) Find the probability of the events {X > 1}, {5 < X < 7}, {X = 20}. 


The loose hand of a clock is spun hard and the outcome ¢ is the angle in the range [0, 27) 
where the hand comes to rest. Consider the random variable X(¢) = 2 sin(Z/4). 


(a) Sketch X as a function of @. 
(b) Find and plot the cdf of X. 
(c) Find the probability of the events {X > 1}, {-1/2 < X < 1/2}, {X =< 1/v2}. 


Repeat Problem 4.9 if 80% of the time the hand comes to rest anywhere in the circle, but 

20% of the time the hand comes to rest at 3, 6, 9, or 12 o’clock. 

The random variable X is uniformly distributed in the interval [—1, 2]. 

(a) Find and plot the cdf of X. 

(b) Use the cdf to find the probabilities of the following events: {X = 0}, 
{|X — 0.5] < 1}, and C = {X > —0.5}. 

The cdf of the random variable X is given by: 


0 x< -1 
RUS 0.5 -1 sxs 

(1 + x)/2 Osxs 

1 x= 


(a) Plot the cdf and identify the type of random variable. 
(b) Find P[X = -1], P[X = —1], P[X < 0.5], P[-0.5 < X < 0.5], P[X > -1], 
P[X = 2), P[X > 3}. 
A random variable X has cdf: 
0 forx < 0 


Fx(x)=ļ41-— we for x = 0. 


(a) Plot the cdf and identify the type of random variable. 

(b) Find P[X = 2], P[X = 0], P[X < 0], P[2 < X < 6], P[X > 10]. 

The random variable X has cdf shown in Fig. P4.1. 

(a) What type of random variable is X? 

(b) Find the following probabilities: P[X < —1], P[X =< —1], P[-1 < X < —0.75], 
P[-0.5 < X < 0], P[-0.5 = X < 0.5], P[|X — 0.5] < 0.5]. 

For B > 0 and à > 0, the Weibull random variable Y has cdf: 


0 forx <0 
Pah {° — etn? for x = 0. 


Problems 


FIGURE P4.1 


(a) Plot the cdf of Y for B = 0.5, 1, and 2. 


(b) Find the probability P[jA < X < (j + 1)A] and P[X > jal. 


(c) Plotlog P| X > x] vs. log x. 
4.16. The random variable X has cdf: 


0 


Fy(x) = 
1 


(a) What values can c assume? 

(b) Plot the cdf. 

(c) Find P[X > 0]. 
Section 4.2: The Probability Density Function 
4.17. A random variable X has pdf: 


(a) Find c and plot the pdf. 
(b) Plot the cdf of X. 


0.5 + csin’(7x/2) 


x <0 
Osx=s1 
x>1. 


-lsxezl 


elsewhere. 


(c) Find P[X = 0], P[0 < X < 0.5], and P[|X — 0.5] < 0.25]. 


4.18. A random variable X has pdf: 


fx(x) = F 


(a) Find c and plot the pdf. 
(b) Plot the cdf of X. 


cx(1 — x°) 


0sxs1 


elsewhere. 


(c) Find P[0 < X < 0.5], P[X = 1], P[.25 < X < 0.5]. 
4.19. (a) In Problem 4.6, find and plot the pdf of the random variable R, the distance from the 


dart to the center of the target. 


(b) Use the pdf to find the probability that the dart is outside the bull’s eye. 
4.20. (a) Find and plot the pdf of the random variable Z in Problem 4.7. 
(b) Use the pdf to find the probability that the minimum is greater than b/3. 
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4.21. 
4.22. 
4.23. 
4.24. 


4.25. 
4.26. 


4.27. 


4.28. 
4.29. 


4.30. 


4.31. 


4.32. 
4.33. 


4.34. 


4.35. 


4.36. 
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(a) Find and plot the pdf in Problem 4.8. 
(b) Use the pdf to find the probabilities of the events: {X > a} and {X > 2a}. 
(a) Find and plot the pdf in Problem 4.12. 
(b) Use the pdf to find P[-1 = X < 0.25]. 
(a) Find and plot the pdf in Problem 4.13. 
(b) Use the pdf to find P[ X = 0], P[X > 8]. 
(a) Find and plot the pdf of the random variable in Problem 4.14. 
(b) Use the pdf to calculate the probabilities in Problem 4.14b. 
Find and plot the pdf of the Weibull random variable in Problem 4.15a. 
Find the cdf of the Cauchy random variable which has pdf: 
ala 


L+ 


fx(x) = -0 < x < œ, 

A voltage X is uniformly distributed in the set {—3, —2,..., 3, 4}. 
(a) Find the pdf and cdf of the random variable X. 

(b) Find the pdf and cdf of the random variable Y = —2X? + 3. 
(c) Find the pdf and cdf of the random variable W = cos(7X/8). 
(d) Find the pdf and cdf of the random variable Z = cos*(a7X/8). 
Find the pdf and cdf of the Zipf random variable in Problem 3.70. 


Let C be an event for which P[C] > 0. Show that Fy(x|C) satisfies the eight properties of 
a cdf. 


(a) In Problem 4.13, find Fy(x|C) where C = {X > 0}. 

(b) Find Fy(x|C) where C = {X = 0}. 

(a) In Problem 4.10, find Fy(x|B) where B = {hand does not stop at 3, 6, 9, or 12 
o’clock}. 

(b) Find Fy(x| B°). 

In Problem 4.13, find fy(x| B) and Fy(x|B) where B = {X > 0.25}. 

Let X be the exponential random variable. 

(a) Find and plot Fy(x| X > t). How does Fy(x| X > t) differ from Fy(x)? 

(b) Find and plot fy(x|X > t). 

(c) Show that P[X >t + x|X >t] = P[X > x]. Explain why this is called the mem- 
oryless property. 

The Pareto random variable X has cdf: 


(a) Find and plot the pdf of X. 

(b) Repeat Problem 4.33 parts a and b for the Pareto random variable. 

(c) What happens to P[X > t + x|X > t] as tbecomes large? Interpret this result. 
(a) Find and plot Fy(x|a = X = b). Compare Fy(x|a <= X = b) to F(x). 

(b) Find and plot fy(x|a = X = b). 

In Problem 4.6, find Fg(r| R > 1) and fa(r| R > 1). 


4.37. 


4.38. 
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(a) In Problem 4.7, find Fz(z|b/4 = Z < b/2) and fz(z|b/4 < Z = b/2). 

(b) Find F;(z| B) and fz(z| B), where B = {x > b/2}. 

A binary transmission system sends a “0” bit using a —1 voltage signal and a “1” bit by 

transmitting a +1. The received signal is corrupted by noise N that has a Laplacian distri- 

bution with parameter a. Assume that “0” bits and “1” bits are equiprobable. 

(a) Find the pdf of the received signal Y = X + N, where X is the transmitted signal, 
given that a “0” was transmitted; that a “1” was transmitted. 

(b) Suppose that the receiver decides a “0” was sent if Y < 0, and a “1” was sent if 
Y = 0. What is the probability that the receiver makes an error given that a +1 was 
transmitted? a —1 was transmitted? 

(c) What is the overall probability of error? 


Section 4.3: The Expected Value of X 


4.39. 
4.40. 
4.41. 
4.42. 
4.43. 
4.44. 
4.45. 
4.46. 


4.47. 
4.48. 
4.49. 


4.50. 
4.51. 
4.52. 
4.53. 


4.54. 


Find the mean and variance of X in Problem 4.17. 

Find the mean and variance of X in Problem 4.18. 

Find the mean and variance of Y, the distance from the dart to the origin, in Problem 4.19. 
Find the mean and variance of Z, the minimum of the coordinates in a square, in Problem 4.20. 
Find the mean and variance of X = (1 — ¢{) "in Problem 4.21. Find E[X] using Eq. (4.28). 
Find the mean and variance of X in Problems 4.12 and 4.22. 

Find the mean and variance of X in Problems 4.13 and 4.23. Find E[X] using Eq. (4.28). 
Find the mean and variance of the Gaussian random variable by direct integration of 
Eqs. (4.27) and (4.34). 

Prove Eqs. (4.28) and (4.29). 

Find the variance of the exponential random variable. 


(a) Show that the mean of the Weibull random variable in Problem 4.15 is T(1 + 1/8) 
where I(x) is the gamma function defined in Eq. (4.56). 


(b) Find the second moment and the variance of the Weibull random variable. 

Explain why the mean of the Cauchy random variable does not exist. 

Show that ELX] does not exist for the Pareto random variable with a = 1 and x,, = 1. 
Verify Eqs. (4.36), (4.37), and (4.38). 

Let Y = A cos(wt) + c where A has mean m and variance g? and w and c are constants. 
Find the mean and variance of Y. Compare the results to those obtained in Example 4.15. 


A limiter is shown in Fig. P4.2. 


FIGURE P4.2 
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(a) 


(b) 
(c) 
(d) 
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Find an expression for the mean and variance of Y = g(X) for an arbitrary contin- 
uous random variable X. 


Evaluate the mean and variance if X is a Laplacian random variable with A = a = 1. 
Repeat part (b) if X is from Problem 4.17 with a = 1/2. 


Evaluate the mean and variance if X = U? where U is a uniform random variable in 
the unit interval, [—1,1] anda = 1/2. 


4.55. A limiter with center-level clipping is shown in Fig. P4.3. 


(a) 


(b) 
(c) 
(d) 


Find an expression for the mean and variance of Y = g(X) for an arbitrary contin- 
uous random variable X. 


Evaluate the mean and variance if X is Laplacian with A = a = 1 and b = 2. 
Repeat part (b) if X is from Problem 4.22, a = 1/2, b = 3/2. 


Evaluate the mean and variance if X = bcos(27U) where U is a uniform random 
variable in the unit interval [—1, 1] and a = 3/4, b = 1/2. 


FIGURE P4.3 


4.56. Let Y = 3X + 2. 


(a) 
(b) 
(c) 
(d) 


Find the mean and variance of Y in terms of the mean and variance of X. 
Evaluate the mean and variance of Y if X is Laplacian. 
Evaluate the mean and variance of Y if X is an arbitrary Gaussian random variable. 


Evaluate the mean and variance of Y if X = b cos(27U) where U is a uniform ran- 
dom variable in the unit interval. 


4.57. Find the nth moment of U, the uniform random variable in the unit interval. Repeat for X 
uniform in [a, b]. 


4.58. 


Consider the quantizer in Example 4.20. 


(a) 
(b) 


Find the conditional pdf of X given that X is in the interval (d, 2d). 


Find the conditional expected value and conditional variance of X given that X is in 
the interval (d, 2d). 
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(c) Now suppose that when X falls in (d, 2d), it is mapped onto the point c where 
d < c < 2d. Find an expression for the expected value of the mean square error: 
E[(X - cy |d < X < 2d]. 

(d) Find the value c that minimizes the above mean square error. Is c the midpoint of 
the interval? Explain why or why not by sketching possible conditional pdf shapes. 

(e) Find an expression for the overall mean square error using the approach in parts c and d. 


Section 4.4: Important Continuous Random Variables 


4.59. 
4.60. 


4.61. 


4.62. 


4.63. 


4.64. 
4.65. 
4.66. 


4.67. 


Let X be a uniform random variable in the interval [—2, 2]. Find and plot P[|X| > x]. 
In Example 4.20, let the input to the quantizer be a uniform random variable in the inter- 
val [—4d, 4d]. Show that Z = X — Q(X) is uniformly distributed in [—d/2, d/2]. 

Let X be an exponential random variable with parameter À. 

(a) Ford > 0 and k a nonnegative integer, find P[kd < X < (k + 1)d]. 

(b) Segment the positive real line into four equiprobable disjoint intervals. 

The rth percentile, 7(r), of a random variable X is defined by P[ X = m(r)] = r/100. 


(a) Find the 90%, 95%, and 99% percentiles of the exponential random variable with 
parameter À. 


(b) Repeat part a for the Gaussian random variable with parameters m = 0 and o°. 
Let X be a Gaussian random variable with m = 5 and a” = 16. 

(a) Find P[X > 4], P[X = 7], P[6.72 < X < 10.16], P[2 < X < 7], P[6 = X = 8]. 
(b) P[X < a] = 0.8869, find a. 

(© P[X > b] = 0.11131, find b. 

(d) P[13 < X = c] = 0.0123, find c. 

Show that the Q-function for the Gaussian random variable satisfies Q(—x) = 1 — Q(x). 
Use Octave to generate Tables 4.2 and 4.3. 

Let X be a Gaussian random variable with mean m and variance o”. 
(a) Find P[X = m]. 

(b) Find P[|X — m| < ko}, fork = 1,2,3, 4,5,6. 

(c) Find the value of k for which Q(k) = P[X > m + ko] = 107 for j = 1,2,3, 4, 5,6. 
A binary transmission system transmits a signal X (—1 to send a “0” bit; +1 to send a “1” 


bit). The received signal is Y = X + N where noise N has a zero-mean Gaussian distrib- 
ution with variance o”. Assume that “0” bits are three times as likely as “1” bits. 


(a) Find the conditional pdf of Y given the input value: fy(y|X = +1) and 
fy(y|X = -1). 
(b) The receiver decides a “0” was transmitted if the observed value of y satisfies 


FOIX = -1)PLX = -1] > fry X = +1)P[X = +1] 


and it decides a “1” was transmitted otherwise. Use the results from part a to show 
that this decision rule is equivalent to: If y < T decide “0”; if y = T decide “1”. 


(c) What is the probability that the receiver makes an error given that a +1 was trans- 
mitted? a —1 was transmitted? Assume g? = 1/16. 


(d) What is the overall probability of error? 


222 


Chapter 4 


4.68. 


4.69. 


4.70. 


4.71. 


4.72. 


4.73. 


4.74. 


4.75. 
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Two chips are being considered for use in a certain system. The lifetime of chip 1 is mod- 
eled by a Gaussian random variable with mean 20,000 hours and standard deviation 
5000 hours. (The probability of negative lifetime is negligible.) The lifetime of chip 2 is 
also a Gaussian random variable but with mean 22,000 hours and standard deviation 
1000 hours. Which chip is preferred if the target lifetime of the system is 20,000 hours? 
24,000 hours? 

Passengers arrive at a taxi stand at an airport at a rate of one passenger per minute. The 
taxi driver will not leave until seven passengers arrive to fill his van. Suppose that pas- 
senger interarrival times are exponential random variables, and let X be the time to fill a 
van. Find the probability that more than 10 minutes elapse until the van is full. 


(a) Show that the gamma random variable has mean: 
E[ X] = ah. 
(b) Show that the gamma random variable has second moment, and variance given by: 
E[ X?] = a(a + 1)/X and VAR[ X] = a/A’. 


(c) Use parts a and b to obtain the mean and variance of an m-Erlang random variable. 

(d) Use parts a and b to obtain the mean and variance of a chi-square random variable. 

The time X to complete a transaction in a system is a gamma random variable with mean 

4 and variance 8. Use Octave to plot P| X > x] as a function of x. Note: Octave uses 

B = 1/2. 

(a) Plot the pdf of an m-Erlang random variable for m = 1,2,3 and A = 1. 

(b) Plot the chi-square pdf for k = 1, 2, 3. 

A repair person keeps four widgets in stock. What is the probability that the widgets in 

stock will last 15 days if the repair person needs to replace widgets at an average rate of 

one widget every three days, where the time between widget failures is an exponential 

random variable? 

(a) Find the cdf of the m-Erlang random variable by integration of the pdf. Hint: Use in- 
tegration by parts. 

(b) Show that the derivative of the cdf given by Eq. (4.58) gives the pdf of an m-Erlang 
random variable. 

Plot the pdf of a beta random variable with:a = b = 1/4, 1, 4,8; a = 5,b = 1;a = 1,b = 3; 

a=2,b=5. 


Section 4.5: Functions of a Random Variable 


4.76. 


4.77. 


4.78. 


Let X be a Gaussian random variable with mean 2 and variance 4. The reward in a system 
is given by Y = (X)*. Find the pdf of Y. 
The amplitude of a radio signal X is a Rayleigh random variable with pdf: 
f(x) = ad x>0, a>0. 

a 
(a) Find the pdf of Z = (X — r)*. 
(b) Find the pdf of Z = X?. 
A wire has length X, an exponential random variable with mean 5z cm. The wire is cut to 


make rings of diameter 1 cm. Find the probability for the number of complete rings pro- 
duced by each length of wire. 


4.79. 


4.80. 


4.81. 


4.82. 
4.83. 


4.84. 


4.85. 


4.86. 


4.87. 
4.88. 


4.89. 
4.90. 


4.91. 


4.92. 


4.93. 


4.94. 
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A signal that has amplitudes with a Gaussian pdf with zero mean and unit variance is ap- 
plied to the quantizer in Example 4.27. 


(a) Pick d so that the probability that X falls outside the range of the quantizer is 1%. 
(b) Find the probability of the output levels of the quantizer. 


The signal X is amplified and shifted as follows: Y = 2X + 3, where X is the random 
variable in Problem 4.12. Find the cdf and pdf of Y. 


The net profit in a transaction is given by Y = 2 — 4X where X is the random variable in 
Problem 4.13. Find the cdf and pdf of Y. 


Find the cdf and pdf of the output of the limiter in Problem 4.54 parts b, c, and d. 


Find the cdf and pdf of the output of the limiter with center-level clipping in Problem 4.55 
parts b, c, and d. 


Find the cdf and pdf of Y = 3X + 2 in Problem 4.56 parts b, c, and d. 


The exam grades in a certain class have a Gaussian pdf with mean m and standard devia- 
tion ø. Find the constants a and b so that the random variable y = aX + b has a Gauss- 
ian pdf with mean m’ and standard deviation a’. 


Let X = U” where n is a positive integer and U is a uniform random variable in the unit 
interval. Find the cdf and pdf of X. 


Repeat Problem 4.86 if U is uniform in the interval [—1, 1]. 
Let Y = |X| be the output of a full-wave rectifier with input voltage X. 


(a) Find the cdf of Y by finding the equivalent event of {Y = y}. Find the pdf of Y by 
differentiation of the cdf. 


(b) Find the pdf of Y by finding the equivalent event of {y < Y = y + dy}. Does the 
answer agree with part a? 


(c) What is the pdf of Y if the fy(x) is an even function of x? 
Find and plot the cdf of Y in Example 4.34. 


A voltage X is a Gaussian random variable with mean 1 and variance 2. Find the pdf of 
the power dissipated by an R-ohm resistor P = RX’. 


Let Y = e*. 
(a) Find the cdf and pdf of Y in terms of the cdf and pdf of X. 


(b) Find the pdf of Y when X is a Gaussian random variable. In this case Y is said to be 
a lognormal random variable. Plot the pdf and cdf of Y when X is zero-mean with 
variance 1/8; repeat with variance 8. 


Let a radius be given by the random variable X in Problem 4.18. 
(a) Find the pdf of the area covered by a disc with radius X. 
(b) Find the pdf of the volume of a sphere with radius X. 

(c) Find the pdf of the volume of a sphere in R”: 


ere X"(2X4X-++xXn) forn even 


Y= 
(2m) X"(1 X3 X-X n) forn odd. 


In the quantizer in Example 4.20, let Z = X — q(X). Find the pdf of Z if X is a Lapla- 
cian random variable with parameter a = d/2. 


Let Y = a tan mX, where X is uniformly distributed in the interval (—1, 1). 
(a) Show that Y is a Cauchy random variable. 
(b) Find the pdf of Y = 1/X. 
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4.95. 


4.96. 


Let X be a Weibull random variable in Problem 4.15. Let Y = (X/A)®. Find the cdf and 
pdf of Y. 


Find the pdf of X = —In(1 — U), where U is a uniform random variable in (0, 1). 


Section 4.6: The Markov and Chebyshev Inequalities 


4.97. 


4.98. 


4.99. 


4.100. 


4.101. 


Compare the Markov inequality and the exact probability for the event {X > c} asa func- 
tion of c for: 


(a) Xis a uniform random variable in the interval [0, b]. 
(b) Xis an exponential random variable with parameter A. 
(c) X isa Pareto random variable with a > 1. 

(d) Xis a Rayleigh random variable. 


Compare the Markov inequality and the exact probability for the event {X > c} asa func- 
tion of c for: 


(a) Xis a uniform random variable in {1,2,..., L}. 

(b) Xis a geometric random variable. 

(c) Xis a Zipf random variable with L = 10; L = 100. 

(d) Xis a binomial random variable with n = 10, p = 0.5;n = 50, p = 0.5. 


Compare the Chebyshev inequality and the exact probability for the event {|X — m| > c} 
as a function of c for: 


(a) Xis a uniform random variable in the interval | —b, b]. 
(b) Xis a Laplacian random variable with parameter a. 
(c) Xis a zero-mean Gaussian random variable. 


(d) Xis a binomial random variable with n = 10, p = 0.5;n = 50, p = 0.5. 


Let X be the number of successes in n Bernoulli trials where the probability of success is 
p. Let Y = X/n be the average number of successes per trial. Apply the Chebyshev in- 
equality to the event {|Y — p| > a}. What happens as n —> œ? 

Suppose that light bulbs have exponentially distributed lifetimes with unknown mean 
E[X]. Suppose we measure the lifetime of n light bulbs, and we estimate the mean ELX] 
by the arithmetic average Y of the measurements. Apply the Chebyshev inequality to the 
event {|Y — E[X]| > a}. What happens as n— œ? Hint: Use the m-Erlang random 
variable. 


Section 4.7: Transform Methods 


4.102. 


4.103. 


4.104. 


(a) Find the characteristic function of the uniform random variable in [ —b, b]. 
(b) Find the mean and variance of X by applying the moment theorem. 

(a) Find the characteristic function of the Laplacian random variable. 

(b) Find the mean and variance of X by applying the moment theorem. 


Let ®y(w) be the characteristic function of an exponential random variable. What ran- 
dom variable does ®4,(w) correspond to? 


4.105. 


4.106. 


4.107. 
4.108. 


4.109. 


4.110. 


4.111. 


4.112. 


4.113. 


4.114. 


4.115. 


4.116. 
4.117. 


4.118. 


4.119. 


4.120. 


4.121. 
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Find the mean and variance of the Gaussian random variable by applying the moment 
theorem to the characteristic function given in Table 4.1. 


Find the characteristic function of Y = aX + b where X is a Gaussian random variable. 
Hint: Use Eq. (4.79). 
Show that the characteristic function for the Cauchy random variable is e, 


Find the Chernoff bound for the exponential random variable with A = 1. Compare the 
bound to the exact value for P| X > 5]. 


(a) Find the probability generating function of the geometric random variable. 

(b) Find the mean and variance of the geometric random variable from its pgf. 

(a) Find the pgf for the binomial random variable X with parameters n and p. 

(b) Find the mean and variance of X from the pgf. 

Let Gy(z) be the pgf for a binomial random variable with parameters n and p, and let 
Gy(z) be the pgf for a binomial random variable with parameters m and p. Consider the 
function Gy(z) Gy(z). Is this a valid pgf? If so, to what random variable does it corre- 
spond? 

Let Gy(z) be the pgf for a Poisson random variable with parameter a, and let Gy(z) be 
the pgf for a Poisson random variable with parameters 6. Consider the function 
Gy(z) Gy(z). Is this a valid pgf? If so, to what random variable does it correspond? 


Let N be a Poisson random variable with parameter a = 1. Compare the Chernoff bound 
and the exact value for P[ X = 5]. 


(a) Find the pgf Gy(z) for the discrete uniform random variable U. 
(b) Find the mean and variance from the pef. 


(c) Consider Gy(z)*. Does this function correspond to a pef? If so, find the mean of the 
corresponding random variable. 

(a) Find P[X = r] for the negative binomial random variable from the pgf in Table 3.1. 

(b) Find the mean of X. 

Derive Eq. (4.89). 

Obtain the nth moment of a gamma random variable from the Laplace transform of 

its pdf. 

Let X be the mixture of two exponential random variables (see Example 4.58). Find the 

Laplace transform of the pdf of X. 

The Laplace transform of the pdf of a random variable X is given by: 


a b 
stast+b 


X*(s) = 


Find the pdf of X. Hint: Use a partial fraction expansion of X*(s). 

Find a relationship between the Laplace transform of a gamma random variable pdf with 

parameters a and à and the Laplace transform of a gamma random variable with para- 

meters a — 1 and A. What does this imply if X is an m-Erlang random variable? 

(a) Find the Chernoff bound for P| X > t] for the gamma random variable. 

(b) Compare the bound to the exact value of P| X = 9] for an m = 3,A = 1 Erlang 
random variable. 
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Section 4.8: Basic Reliability Calculations 


4.122. 


4.123. 


4.124. 


4.125. 


4.126. 


4.127. 


4.128. 


The lifetime T of a device has pdf 


110m 0<t<T 
t) = < 09AT) t>, 
T 0 
0 t< Tp. 


(a) Find the reliability and MTTF of the device. 

(b) Find the failure rate function. 

(c) How many hours of operation can be considered to achieve 99% reliability? 
The lifetime T of a device has pdf 


= 1/Tp astsat To 
AG = T elsewhere. 


(a) Find the reliability and MTTF of the device. 

(b) Find the failure rate function. 

(c) How many hours of operation can be considered to achieve 99% reliability? 
The lifetime T of a device is a Rayleigh random variable. 

(a) Find the reliability of the device. 

(b) Find the failure rate function. Does r(t) increase with time? 

(c) Find the reliability of two devices that are in series. 

(d) Find the reliability of two devices that are in parallel. 

The lifetime T of a device is a Weibull random variable. 

(a) Plot the failure rates for a = 1 and £ = 0.5; fora = 1 and B = 2. 
(b) Plot the reliability functions in part a. 

(c) Plot the reliability of two devices that are in series. 

(d) Plot the reliability of two devices that are in parallel. 


A system starts with m devices, 1 active and m — 1 on standby. Each device has an expo- 
nential lifetime. When a device fails it is immediately replaced with another device (if one 


is still available). 
(a) Find the reliability of the system. 
(b) Find the failure rate function. 


Find the failure rate function of the memory chips discussed in Example 2.28. Plot 


In(r(t)) versus at. 


A device comes from two sources. Devices from source 1 have mean m and exponentially 
distributed lifetimes. Devices from source 2 have mean m and Pareto-distributed lifetimes 


with a > 1. Assume a fraction p is from source 1 and a fraction 1 — p from source 2. 
(a) Find the reliability of an arbitrarily selected device. 
(b) Find the failure rate function. 


4.129. 


4.130. 


4.131. 
4.132. 


4.133. 


Problems 227 


A device has the failure rate function: 


1+9(1-t) 0st<1 
r(t)=41 1=<t<10 
1 + 10(t — 10) t = 10. 


Find the reliability function and the pdf of the device. 

A system has three identical components and the system is functioning if two or more 

components are functioning. 

(a) Find the reliability and MTTF of the system if the component lifetimes are expo- 
nential random variables with mean 1. 

(b) Find the reliability of the system if one of the components has mean 2. 

Repeat Problem 4.130 if the component lifetimes are Weibull distributed with B = 3. 

A system consists of two processors and three peripheral units. The system is functioning 

as long as one processor and two peripherals are functioning. 

(a) Find the system reliability and MTTF if the processor lifetimes are exponential ran- 
dom variables with mean 5 and the peripheral lifetimes are Rayleigh random vari- 
ables with mean 10. 

(b) Find the system reliability and MTTF if the processor lifetimes are exponential ran- 
dom variables with mean 10 and the peripheral lifetimes are exponential random 
variables with mean 5. 

An operation is carried out by a subsystem consisting of three units that operate in a se- 

ries configuration. 

(a) The units have exponentially distributed lifetimes with mean 1. How many subsys- 
tems should be operated in parallel to achieve a reliability of 99% in T hours of 
operation? 

(b) Repeat part a with Rayleigh-distributed lifetimes. 

(c) Repeat part a with Weibull-distributed lifetimes with B = 3. 


Section 4.9; Computer Methods for Generating Random Variables 


4.134. 


4.135. 


Octave provides function calls to evaluate the pdf and cdf of important continuous ran- 
dom variables. For example, the functions \normal_cdf (x, m, var) and normal_pdf (x, m, 
var) compute the cdf and pdf, respectively, at x for a Gaussian random variable with 
mean mand variance var. 


(a) Plot the conditional pdfs in Example 4.11 if v = +2 and the noise is zero-mean and 
unit variance. 

(b) Compare the cdf of the Gaussian random variable with the Chernoff bound ob- 
tained in Example 4.44. 


Plot the pdf and cdf of the gamma random variable for the following cases. 
(a) à= landa = 1,2,4. 
(b) à = 1/2 and a = 1/2, 1, 3/2, 5/2. 
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4.136. 


4.137. 


4.138. 


4.139. 


4.140. 


4.141. 


4.142. 


One Random Variable 


The random variable X has the triangular pdf shown in Fig. P4.4. 
(a) Find the transformation needed to generate X. 


(b) Use Octave to generate 100 samples of X. Compare the empirical pdf of the samples 
with the desired pdf. 


FIGURE P4.4 


For each of the following random variables: Find the transformation needed to generate 
the random variable X; use Octave to generate 1000 samples of X; Plot the sequence of 
outcomes; compare the empirical pdf of the samples with the desired pdf. 

(a) Laplacian random variable with a = 1. 

(b) Pareto random variable with a = 1.5, 2, 2.5. 

(c) Weibull random variable with B = 0.5, 2,3 and à = 1. 

A random variable Y of mixed type has pdf 


f(x) = pd(x) + (1 — p)fy(x), 


where X is a Laplacian random variable and p is a number between zero and one. Find 
the transformation required to generate Y. 


Specify the transformation method needed to generate the geometric random variable 
with parameter p = 1/2. Find the average number of comparisons needed in the search 
to determine each outcome. 


Specify the transformation method needed to generate the Poisson random variable with 
small parameter a. Compute the average number of comparisons needed in the search. 
The following rejection method can be used to generate Gaussian random variables: 

1. Generate U,, a uniform random variable in the unit interval. 

2. Let X, = —In(U,). 

3. Generate U,, a uniform random variable in the unit interval. If U, = 

exp{—(X, — 1)7/2}, accept X,. Otherwise, reject X, and go to step 1. 
4. Generate a random sign (+ or —) with equal probability. Output X equal to X; 
with the resulting sign. 
(a) Show that if X; is accepted, then its pdf corresponds to the pdf of the absolute value 
of a Gaussian random variable with mean 0 and variance 1. 

(b) Show that_X is a Gaussian random variable with mean 0 and variance 1. 
Cheng (1977) has shown that the function Kfz(x) bounds the pdf of a gamma random 
variable with a > 1, where 


Naor} 
fz(x) = (a + x4? and K = (2a = 1). 


Find the cdf of fz(x) and the corresponding transformation needed to generate Z. 


4.143. 


4.144, 


4.145. 


4.146. 
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(a) Show that in the modified rejection method, the probability of accepting X; is 1/K. 
Hint: Use conditional probability. 

(b) Show that Z has the desired pdf. 

Two methods for generating binomial random variables are: (1) Generate n Bernoulli 

random variables and add the outcomes; (2) Divide the unit interval according to bino- 

mial probabilities. Compare the methods under the following conditions: 

(a) p = 1/2,n = 5,25, 50; 

(b) p = 0.1,n = 5, 25,50. 

(c) Use Octave to implement the two methods by generating 1000 binomially distrib- 
uted samples. 

Let the number of event occurrences in a time interval be a Poisson random variable. In 

Section 3.4, it was found that the time between events for a Poisson random variable is an 

exponentially distributed random variable. 

(a) Explain how one can generate Poisson random variables from a sequence of expo- 
nentially distributed random variables. 

(b) How does this method compare with the one presented in Problem 4.140? 

(c) Use Octave to implement the two methods when a = 3, a = 25, anda = 100. 

Write a program to generate the gamma pdf with a > 1 using the rejection method dis- 

cussed in Problem 4.142. Use this method to generate m-Erlang random variables with 

m = 2,10 and A = 1 and compare the method to the straightforward generation of m ex- 

ponential random variables as discussed in Example 4.57. 


*Section 4.10: Entropy 


4.147. 


4.148. 


4.149, 


4.150. 


4.151. 


4.152. 


Let X be the outcome of the toss of a fair die. 

(a) Find the entropy of X. 

(b) Suppose you are told that X is even. What is the reduction in entropy? 

A biased coin is tossed three times. 

(a) Find the entropy of the outcome if the sequence of heads and tails is noted. 

(b) Find the entropy of the outcome if the number of heads is noted. 

(c) Explain the difference between the entropies in parts a and b. 

Let X be the number of tails until the first heads in a sequence of tosses of a biased coin. 
(a) Find the entropy of X given that X = k. 

(b) Find the entropy of X given that X = k. 


One of two coins is selected at random: Coin A has P[heads] = 1/10 and coin B has 
P{heads] = 9/10. 


(a) Suppose the coin is tossed once. Find the entropy of the outcome. 

(b) Suppose the coin is tossed twice and the sequence of heads and tails is observed. 
Find the entropy of the outcome. 

Suppose that the randomly selected coin in Problem 4.150 is tossed until the first occur- 

rence of heads. Suppose that heads occurs in the kth toss. Find the entropy regarding the 

identity of the coin. 

A communication channel accepts input J from the set {0, 1, 2, 3, 4,5, 6}. The channel 

output is X = J + N mod7, where N is equally likely to be +1 or —1. 

(a) Find the entropy of I if all inputs are equiprobable. 

(b) Find the entropy of J given that X = 4. 
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4.153. 


4.154. 


4.155. 


4.156. 


4.157. 


4.158. 


4.159. 


4.160. 


4.161. 


4.162. 


4.163. 


4.164. 
4.165. 


4.166. 


One Random Variable 


Let X be a discrete random variable with entropy Hy. 

(a) Find the entropy of Y = 2X. 

(b) Find the entropy of any invertible transformation of X. 

Let (X, Y) be the pair of outcomes from two independent tosses of a die. 

(a) Find the entropy of X. 

(b) Find the entropy of the pair (X, Y). 

(c) Find the entropy in n independent tosses of a die. Explain why entropy is additive in 
this case. 

Let X be the outcome of the toss of a die, and let Y be a randomly selected integer less 

than or equal to X. 

(a) Find the entropy of Y. 

(b) Find the entropy of the pair (X, Y) and denote it by H(X, Y). 

(c) Find the entropy of Y given X = k and denote it by g(k) = H(Y|X = k). Find 
E[g(X)] = E[H(Y | X)]. 

(d) Show that H(X, Y) = Hy + E[H(Y|X)]. Explain the meaning of this equation. 

Let X take on values from {1, 2,..., K}. Suppose that P[X = K] = p, and let Hy be the 

entropy of X given that X is not equal to K. Show that Hy = -pln p — (1 — p) 

In(1 — p) + (1 - p)Ay. 

Let X be a uniform random variable in Example 4.62. Find and plot the entropy of Q as a 


function of the variance of the error X — Q(X). Hint: Express the variance of the error 
in terms of d and substitute into the expression for the entropy of Q. 


A communication channel accepts as input either 000 or 111. The channel transmits each 
binary input correctly with probability 1 — p and erroneously with probability p. Find 
the entropy of the input given that the output is 000; given that the output is 010. 

Let X be a uniform random variable in the interval [—a, a]. Suppose we are told that the 
X is positive. Use the approach in Example 4.62 to find the reduction in entropy. Show 
that this is equal to the difference of the differential entropy of X and the differential en- 
tropy of X given {X > 0}. 

Let X be uniform in [a, b], and let Y = 2X. Compare the differential entropies of X and 
Y. How does this result differ from the result in Problem 4.153? 

Find the pmf for the random variable X for which the sequence of questions in Fig. 4.26(a) 
is optimum. 

Let the random variable X have Sy = {1, 2, 3,4,5,6} and pmf (3/8, 3/8, 1/8, 1/16, 1/32, 
1/32). Find the entropy of X. What is the best code you can find for X? 

Seven cards are drawn from a deck of 52 distinct cards. How many bits are required to 
represent all possible outcomes? 


Find the optimum encoding for the geometric random variable with p = 1/2. 

An urn experiment has 10 equiprobable distinct outcomes. Find the performance of the 
best tree code for encoding (a) a single outcome of the experiment; (b) a sequence of n 
outcomes of the experiment. 

A binary information source produces n outputs. Suppose we are told that there are k 1’s 
in these n outputs. 

(a) What is the best code to indicate which pattern of k 1’s and n — k 0’s occurred? 


(b) How many bits are required to specify the value of k using a code with a fixed num- 
ber of bits? 


4.167. 


4.168. 


4.169. 
4.170. 


4.171. 
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The random variable X takes on values from the set {1, 2, 3, 4}. Find the maximum en- 
tropy pmf for X given that E[ X] = 2. 
The random variable X is nonnegative. Find the maximum entropy pdf for X given that 
E[X] = 10. 
Find the maximum entropy pdf of X given that E[ X?] = c. 
Suppose we are given two parameters of the random variable X, E[ g,(X)] = cı and 
Elg(X)] = o. 
(a) Show that the maximum entropy pdf for X has the form 
fx(x) = Ce 81(*) A2821), 
(b) Find the entropy of X. 
Find the maximum entropy pdf of X given that E[ X] = mand VAR[X] = ø’. 


Problems Requiring Cumulative Knowledge 


4.172. 


4.173. 


4.174. 


4.175. 


Three types of customers arrive at a service station. The time required to service type 1 
customers is an exponential random variable with mean 2. Type 2 customers have a Pare- 
to distribution with a = 3 and x,, = 1. Type 3 customers require a constant service time 
of 2 seconds. Suppose that the proportion of type 1, 2, and 3 customers is 1/2, 1/8, and 3/8, 
respectively. Find the probability that an arbitrary customer requires more than 15 sec- 
onds of service time. Compare the above probability to the bound provided by the 
Markov inequality. 


The lifetime X of a light bulb is a random variable with 


P[X > t] = 2/(2 + t) fort > 0. 


Suppose three new light bulbs are installed at time ¢ = 0. At time ¢ = 1 all three light 

bulbs are still working. Find the probability that at least one light bulb is still working at 

time t = 9. 

The random variable X is uniformly distributed in the interval [0, a]. Suppose a is un- 

known, so we estimate a by the maximum value observed in n independent repetitions of 

the experiment; that is, we estimate a by Y = max{ Xj, X2,..., Xn} 

(a) Find P[Y = y]. 

(b) Find the mean and variance of Y, and explain why Y is a good estimate for a when N 
is large. 

The sample X of a signal is a Gaussian random variable with m = 0 and a” = 1. Suppose 

that X is quantized by a nonuniform quantizer consisting of four intervals: 

(—œ, —a], (—a, 0], (0, a], and (a, œ). 

(a) Find the value of a so that X is equally likely to fall in each of the four intervals. 

(b) Find the representation point x; = g(X) for X in (0, a] that minimizes the mean- 
squared error, that is, 


T (x — x1) fx(x) dx is minimized. 
0 


Hint: Differentiate the above expression with respect to x;. Find the representation 
points for the other intervals. 


(c) Evaluate the mean-squared error of the quantizer E[(X — q(X)’]. 
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4.176. The output Y of a binary communication system is a unit-variance Gaussian random with 
mean zero when the input is “0” and mean one when the input is “one”. Assume the input 
is 1 with probability p. 
(a) Find P[input is1|y < Y < y + h] and P[inputisO|y < Y < y+ h]. 
(b) The receiver uses the following decision rule: 
If Plinput is 1| y < Y < y + h] > P[inputis0|y < Y < y + h], decide input 
was 1; otherwise, decide input was 0. 
Show that this decision rule leads to the following threshold rule: 
If Y > T, decide input was 1; otherwise, decide input was 0. 
(c) What is the probability of error for the above decision rule? 


Pairs of Random 
Variables 


5.1 


CHAPTER 


Many random experiments involve several random variables. In some experiments a 
number of different quantities are measured. For example, the voltage signals at sever- 
al points in a circuit at some specific time may be of interest. Other experiments in- 
volve the repeated measurement of a certain quantity such as the repeated 
measurement (“sampling”) of the amplitude of an audio or video signal that varies 
with time. In Chapter 4 we developed techniques for calculating the probabilities of 
events involving a single random variable in isolation. In this chapter, we extend the 
concepts already introduced to two random variables: 


e We use the joint pmf, cdf, and pdf to calculate the probabilities of events that in- 
volve the joint behavior of two random variables; 

e We use expected value to define joint moments that summarize the behavior of 
two random variables; 

e We determine when two random variables are independent, and we quantify 
their degree of “correlation” when they are not independent; 

e We obtain conditional probabilities involving a pair of random variables. 


In a sense we have already covered all the fundamental concepts of probability 
and random variables, and we are “simply” elaborating on the case of two or more ran- 
dom variables. Nevertheless, there are significant analytical techniques that need to be 
learned, e.g., double summations of pmf’s and double integration of pdf’s, so we first 
discuss the case of two random variables in detail because we can draw on our geomet- 
ric intuition. Chapter 6 considers the general case of vector random variables. Through- 
out these two chapters you should be mindful of the forest (fundamental concepts) and 
the trees (specific techniques)! 


TWO RANDOM VARIABLES 


The notion of a random variable as a mapping is easily generalized to the case where 
two quantities are of interest. Consider a random experiment with sample space S and 
event class F. We are interested in a function that assigns a pair of real numbers 
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(a) A function assigns a pair of real numbers to each outcome 
in S. (b) Equivalent events for two random variables. 


X(¢) = (X(¢),Y(¢)) to each outcome ¢ in S. Basically we are dealing with a vector 
function that maps S into R?, the real plane, as shown in Fig. 5.1(a). We are ultimately in- 
terested in events involving the pair (X, Y). 


Example 5.1 


Let a random experiment consist of selecting a student’s name from an urn. Let ¢ denote the 
outcome of this experiment, and define the following two functions: 


H(¢) = height of student ¢ in centimeters 


W(¢) = weight of student ¢ in kilograms 


(H(¢), W(¢)) assigns a pair of numbers to each ¢ in S. 

We are interested in events involving the pair (H, W). For example, the event 
B = {H = 183, W < 82} represents students with height less that 183 cm (6 feet) and weight less 
than 82 kg (180 Ib). 


Example 5.2 


A Web page provides the user with a choice either to watch a brief ad or to move directly to the 
requested page. Let ¢ be the patterns of user arrivals in T seconds, e.g., number of arrivals, and 
listing of arrival times and types. Let N,(Z) be the number of times the Web page is directly re- 
quested and let N,(Z) be the number of times that the ad is chosen. (N;(¢), N2(¢)) assigns a pair 
of nonnegative integers to each ¢ in S. Suppose that a type 1 request brings 0.001¢ in revenue 
and a type 2 request brings in 1¢. Find the event “revenue in T seconds is less than $100.” 

The total revenue in T seconds is 0.001 M, + 1N», and so the event of interest is 
B = {0.001 N; + 1N, < 10,000}. 
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Example 5.3 


Let the outcome ¢ in a random experiment be the length of a randomly selected message. Sup- 
pose that messages are broken into packets of maximum length M bytes. Let Q be the number of 
full packets in a message and let R be the number of bytes left over. (Q(Z), R(Z)) assigns a pair 
of numbers to each ¢ in S. Q takes on values in the range 0, 1, 2,..., and R takes on values in the 
range 0,1,..., M — 1. An event of interest may be B = {R < M/2}, “the last packet is less than 
half full.” 


Example 5.4 


Let the outcome of a random experiment result in a pair ¢ = (¢, ¢2) that results from two in- 
dependent spins of a wheel. Each spin of the wheel results in a number in the interval (0, 277]. 
Define the pair of numbers (X, Y) in the plane as follows: 


12 12 
X(f) = (222) cos £5 Y(g) = (222) sin £5. 


1 1 


The vector function (X(), Y (¢)) assigns a pair of numbers in the plane to each ¢ in S. The 
square root term corresponds to a radius and to ¢, an angle. 

We will see that (X, Y) models the noise voltages encountered in digital communication 
systems. An event of interest here may be B = {X? + Y? < r?°}, “total noise power is less 
than r^.” 


The events involving a pair of random variables (X, Y) are specified by conditions 
that we are interested in and can be represented by regions in the plane. Figure 5.2 
shows three examples of events: 


A={X+Y <10} 
B = {min(X,Y) = 5} 
C= {X? + Y? < 100}. 


Event A divides the plane into two regions according to a straight line. Note that the 
event in Example 5.2 is of this type. Event C identifies a disk centered at the origin and 
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FIGURE 5.2 
Examples of two-dimensional events. 
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it corresponds to the event in Example 5.4. Event B is found by noting that 
{min(X, Y) = 5} = {X =5}U{Y <5}, that is, the minimum of X and Y is less 
than or equal to 5 if either X and/or Y is less than or equal to 5. 

To determine the probability that the pair X = (X, Y) is in some region B in the 
plane, we proceed as in Chapter 3 to find the equivalent event for B in the underlying 
sample space S: 

A = X"(B) = {¢:(X(£), ¥(¢)) in B}. (5.1a) 


The relationship between A = X!(B) and B is shown in Fig. 5.1(b). If A is in F, then 
it has a probability assigned to it, and we obtain: 


P[X in B] = PLA] = Pl{é:(X(é), Y (£) in B}. (5.1b) 


The approach is identical to what we followed in the case of a single random variable. 
The only difference is that we are considering the joint behavior of X and Y that is in- 
duced by the underlying random experiment. 

A scattergram can be used to deduce the joint behavior of two random variables. 
A scattergram plot simply places a dot at every observation pair (x, y) that results from 
performing the experiment that generates (X, Y). Figure 5.3 shows the scattergram for 
200 observations of four different pairs of random variables. The pairs in Fig. 5.3(a) ap- 
pear to be uniformly distributed in the unit square. The pairs in Fig. 5.3(b) are clearly 
confined to a disc of unit radius and appear to be more concentrated near the origin. 
The pairs in Fig. 5.3(c) are concentrated near the origin, and appear to have circular 
symmetry, but are not bounded to an enclosed region. The pairs in Fig. 5.3(d) again are 
concentrated near the origin and appear to have a clear linear relationship of some 
sort, that is, larger values of x tend to have linearly proportional increasing values of y. 
We later introduce various functions and moments to characterize the behavior of 
pairs of random variables illustrated in these examples. 

The joint probability mass function, joint cumulative distribution function, and 
joint probability density function provide approaches to specifying the probability law 
that governs the behavior of the pair (X, Y). Our general approach is as follows. We 
first focus on events that correspond to rectangles in the plane: 


B= {Xin A} A {Y in Aj} (5.2) 


where A; is a one-dimensional event (i.e., subset of the real line). We say that these 
events are of product form. The event B occurs when both {X in A4} and {Y in A3} 
occur jointly. Figure 5.4 shows some two-dimensional product-form events. We use Eq. 
(5.1b) to find the probability of product-form events: 


P[B] = P[{X in Aj} N{Y in Ay}] = PLX in Ay, Y in A,]. (5.3) 
By defining A appropriately we then obtain the joint pmf, joint cdf, and joint pdf of 


(X,Y). 


PAIRS OF DISCRETE RANDOM VARIABLES 


Let the vector random variable X = (X,Y) assume values from some countable set 
Sxy = {(xj, yk) j = 1,2,...,k = 1,2,...}. The joint probability mass function of X 
specifies the probabilities of the event {X = x} N {Y = y}: 
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A scattergram for 200 observations of four different pairs of random variables. 
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FIGURE 5.4 
Some two-dimensional product-form events. 


238 Chapter 5 Pairs of Random Variables 


Pxy(%y) = PI{X = x} N{Y = y}] 
2 P[X=x,Y=y]  for(x,y)eR?.  (5.4a) 


The values of the pmf on the set Sy y provide the essential information: 


Pxy(Xj, Ye) = PI{X = xj} N {Y = y}] 
= P| X AA Yk] (xj, yYk)ESxy. (5.4b) 


There are several ways of showing the pmf graphically: (1) For small sample 
spaces we can present the pmf in the form of a table as shown in Fig. 5.5(a). (2) We can 
present the pmf using arrows of height py y(x;, yk) placed at the points {(x;, y,)} in 
the plane, as shown in Fig. 5.5(b), but this can be difficult to draw. (3) We can place dots 
at the points {(x;, y,)} and label these with the corresponding pmf value as shown in 


Fig. 5.5(c). 
The probability of any event B is the sum of the pmf over the outcomes in B: 
P[X in B] = 2 D> Pxyv(x (Xj. Yk). (5.5) 
Xj,Yx) in B 


Frequently it is helpful to sketch the region that contains the points in B as shown, for 
example, in Fig. 5.6. When the event B is the entire sample space Sy y, we have: 


> Spx. y(xj, y) = 1. (5.6) 


Example 5.5 


A packet switch has two input ports and two output ports. At a given time slot a packet arrives at 
each input port with probability 1/2, and is equally likely to be destined to output port 1 or 2. Let 
X and Y be the number of packets destined for output ports 1 and 2, respectively. Find the pmf 
of X and Y, and show the pmf graphically. 

The outcome J; for an input port j can take the following values: “n”, no packet arrival 
(with probability 1/2); “a1”, packet arrival destined for output port 1 (with probability 1/4); “a2”, 
packet arrival destined for output port 2 (with probability 1/4). The underlying sample space S 
consists of the pair of input outcomes ¢ = (4, J). The mapping for (X, Y) is shown in the table 
below: 


c (n,n) | (n,al) | (n,a2) | (al,n) | (al,al) (al, a2) (a2,n) | (a2,a1) (a2, a2) 


X,Y| (0,0) |,0) | (0,1) |,0) | (2,0) (1,1) (0,1) | @,1) (0,2) 


The pmf of (X, Y) is then: 


pxy(0,0) = P[¢ = (n, n)] = x5 zl 


pxy(0,1) = P[¢e{(n, a2), (a2, n)}] = zi sS 
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FIGURE 5.5 


Graphical representations of pmf’s: (a) in table format; (b) use of arrows to show height; 
(c) labeled dots corresponding to pmf value. 
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FIGURE 5.6 
Showing the pmf via a sketch containing the points in B. 


pxy(1,0) = Pie {(n, a1), (al, n)}] = 5, 
pxx(1,1) = P[¢ e {(al, a2), (22, al)}] = 5. 
pxy(0,2) = PIZ = (a2,a2)] = =, 
pyy(2,0) = P[¢ = (al,al)] = = 


Figure 5.5(a) shows the pmf in tabular form where the number of rows and columns ac- 
commodate the range of X and Y respectively. Each entry in the table gives the pmf value for the 
corresponding x and y. Figure 5.5(b) shows the pmf using arrows in the plane. An arrow of height 
Px.y(j, k) is placed at each of the points in Sy y = {(0, 0), (0,1), (1, 0), (1, 1), (0, 2), (2, 0)}. 
Figure 5.5(c) shows the pmf using labeled dots in the plane. A dot with label py y(j, k) is placed 
at each of the points in Sy y. 


Example 5.6 


A random experiment consists of tossing two “loaded” dice and noting the pair of numbers 
(X, Y) facing up. The joint pmf py y(j, k) for j = 1,...,6 and k = 1,..., 6 is given by the two- 
dimensional table shown in Fig. 5.6. The (j, k) entry in the table contains the value py y(j, k). 
Find the P[min(X, Y) = 3]. 

Figure 5.6 shows the region that corresponds to the set {min(x, y) = 3}. The probability 
of this event is given by: 


5.2.1 
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P[min(X, Y) = 3] = py y(6,3) + pyy(5,3) + pr y(4,3) 
+ py y(3,3) + pxy(3,4) + pxy(3,5) + px y(3, 6) 


-(i)+2-8 
42) Ae 4 


Marginal Probability Mass Function 


The joint pmf of X provides the information about the joint behavior of X and Y. We 

are also interested in the probabilities of events involving each of the random variables 
in isolation. These can be found in terms of the marginal probability mass functions: 

Px(xj) = P[X = x;] 

P[X = x;, Y = anything] 


P|{X = x; and Y = y}U{X = x andY = y}U_ | 


Dex Yk), (5.7a) 


and similarly, 
Py(Ye) = PLY = yx] 


= Dpxy xj, Yk): (5.7b) 


The marginal pmf’s satisfy all the properties of one-dimensional pmf’s, and they 
supply the information required to compute the probability of events involving the 
corresponding random variable. 

The probability py y(x;, yk) can be interpreted as the long-term relative frequency 
of the joint event {X = X;}M{Y = Y;} in a sequence of repetitions of the random 
experiment. Equation (5.7a) corresponds to the fact that the relative frequency of the 
event {X = X;} is found by adding the relative frequencies of all outcome pairs in which 
X; appears. In general, it is impossible to deduce the relative frequencies of pairs of values 
X and Y from the relative frequencies of X and Y in isolation. The same is true for pmf’s: 
In general, knowledge of the marginal pmf’s is insufficient to specify the joint pmf. 


Example 5.7 


Find the marginal pmf for the output ports (X, Y) in Example 5.2. 
Figure 5.5(a) shows that the marginal pmf is found by adding entries along a row or column 
in the table. For example, by adding along the x = 1 column we have: 
Te co 
px(1) = P[X = 1] = pxy(1,0) + pxy(1,1) = 4 38 
Similarly, by adding along the y = 0 row: 


1 
py(0) = P[Y = 0] = px y(0,0) + pyy(1,0) + px y(2,0) = z H 7 H = 


Figure 5.5(b) shows the marginal pmf using arrows on the real line. 
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Example 5.8 


Find the marginal pmf’s in the loaded dice experiment in Example 5.2. 
The probability that X = 1 is found by summing over the first row: 


42. 42° x 


Similarly, we find that P[ X = j] = 1/6 for j = 2,..., 6. The probability that Y = k is found by 
summing over the Ath column. We then find that P[Y = k] = 1/6 for k = 1,2,...,6. Thus each 
die, in isolation, appears to be fair in the sense that each face is equiprobable. If we knew only 
these marginal pmf’s we would have no idea that the dice are loaded. 


Example 5.9 


In Example 5.3, let the number of bytes N in a message have a geometric distribution with para- 
meter 1 — p and range Sy = {0,1,2,...}. Find the joint pmf and the marginal pmf’s of Q and R. 

If a message has N bytes, then the number of full packets is the quotient Q in the division 
of N by M, and the number of remaining bytes is the remainder R. The probability of the pair 
{(q,r)} is given by 


P[Q =q, R = r] = P[N = qM + r] = (1 = p)p™*. 
The marginal pmf of Q is 


P[Q = q] = P[N in{qM,qM + 1,...,qM + (M — 1)}] 


(M=1) 
= X dep 
k=0 
1-— p” 
=(L= pp n a (1 = phy q= 0,1,2,... 


The marginal pmf of Q is geometric with parameter p”. The marginal pmf of R is: 
P[R = r] = P[N in{r,M +r,2M +r,...}] 
(= p) 


R has a truncated geometric pmf. As an exercise, you should verify that all the above marginal 
pmf’s add to 1. 


THE JOINT CDF OF XAND Y 


In Chapter 3 we saw that semi-infinite intervals of the form (—00, x] are a basic build- 
ing block from which other one-dimensional events can be built. By defining the cdf 
Fy(x) as the probability of (— co, x], we were then able to express the probabilities of 
other events in terms of the cdf. In this section we repeat the above development for 
two-dimensional random variables. 
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(x1, y1) 


FIGURE 5.7 

The joint cumulative distribution function is defined as 
the probability of the semi-infinite rectangle defined by 
the point (x4, y1). 


A basic building block for events involving two-dimensional random variables is 
the semi-infinite rectangle defined by { (x, y): x S x, and y = yı}, as shown in Fig. 5.7. 
We also use the more compact notation {x = x,, y = yı} to refer to this region. The 
joint cumulative distribution function of X and Y is defined as the probability of the 
event {X = x} N{Y = y}: 


Fy y(x1, y1) = P[X =%,Y = y]. (5.8) 


In terms of relative frequency, Fy y(x1, y1) represents the long-term proportion 
of time in which the outcome of the random experiment yields a point X that falls in 
the rectangular region shown in Fig. 5.7. In terms of probability “mass,” Fy y(x1, 1) 
represents the amount of mass contained in the rectangular region. 

The joint cdf satisfies the following properties. 


(i) The joint cdf is a nondecreasing function of x and y: 
Fy y(x1, y1) = Fy y(%2, y2) if x; S x, and y S y, (5.9a) 
(ii) Fy y(x1,-0) = 0, Fy y(-—©, yı) = 0, Fy y(00, 0) = 1. (5.9b) 


(iii) We obtain the marginal cumulative distribution functions by removing the 
constraint on one of the variables. The marginal cdf’s are the probabilities of 
the regions shown in Fig. 5.8: 


Fx(x1) = Fyy(%1,00) and Fy(y) = Fr y(%, yı). (5.9c) 
(iv) The joint cdf is continuous from the “north” and from the “east,” that is, 
dim Fx y(x, y) = Fyy(a,y) and Jim Fay y) = Fyy(x, b). (5.9d) 
(v) The probability of the rectangle {x; < x S x2, yı < y = yo} is given by: 
Ply < X Ss y,y< YS y]= 
Fy y(%2, y2) — Fxy(x2; %1) — Fxy(x1, y2) + Fx y(x, yı). (5.9e) 
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X 


Fy(x1) = PIX < x, Y < œ] Fy(yı) = PIX < %, Y < yı] 


FIGURE 5.8 
The marginal cdf’s are the probabilities of these half-planes. 


Property (i) follows by noting that the semi-infinite rectangle defined by (x1, y,) is 
contained in that defined by (x2, y2) and applying Corollary 7. Properties (ii) to (iv) 
are obtained by limiting arguments. For example, the sequence {x = x, and y = —n} 
is decreasing and approaches the empty set Ø, so 


Fy y(x%1,-©) = Jim Fy y(x, =n) = P[Ø] = 0. 


For property (iii) we take the sequence {x =< x; and y = n} which increases to 
{x = x1}, so 


lim Fy y(x1, n) = P| X = xı] = Fy(x;). 

n—-oo 
For property (v) note in Fig. 5.9(a) that B = {x1 < x S x2,y S y} = {X Sm, 
Y s y} - {X s x,Y < y},so P[B] = P[x < X S x,Y < y] = Fyy(x2, y1) 


— Fy y(*1, yı). In Fig. 5.9(b), note that Fy y(x2, 2) = P[A] + P[B] + Fx x(x, y2). 
Property (v) follows by solving for P[A] and substituting the expression for P[B]. 


Ya Ya 


yı (x, ¥))  (%, yı) 


(a) (b) 


FIGURE 5.9 
The joint cdf can be used to determine the probability of various events. 
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Joint cdf for packet switch example. 


Example 5.10 


Plot the joint cdf of X and Y from Example 5.6. Find the marginal cdf of X. 
To find the cdf of X, we identify the regions in the plane according to which points in Sy y 
are included in the rectangular region defined by (x, y). For example, 


e The regions outside the first quadrant do not include any of the points, so Fy y(x, y) = 0. 
e The region {0 = x < 1,0 = y < 1} contains the point (0,0),so Fy y(x, y) = 1/4. 
Figure 5.10 shows the cdf after all possible regions are examined. 

We need to consider several cases to find Fy(x). For x < 0, we have Fy(x) = 0. For 
0=x<1, we have Fy(x) = Fyy(x, ©) = 9/16. For 1 = x < 2, we have Fy(x) = Fyy 
(x, CO) = 15/16. Finally, for x = 1, we have Fy(x) = Fy y(x, 0%) = 1. Therefore Fy(x) is a 
staircase function and X is a discrete random variable with py(0) = 9/16, px(1) = 6/16, and 
Px(2) = 1/16. 


Example 5.11 
The joint cdf for the pair of random variables X = (X, Y) is given by 


0 x<0oy<0 

xy 0sxs1,0sys1 

x OSxS1y>1 (5.10) 
y OSyS1,x>1 
1 vx =1y=l. 


Fy y(x, y) = 


IA 


Plot the joint cdf and find the marginal cdf of X. 

Figure 5.11 shows a plot of the joint cdf of X and Y. Fy y(x, y) is continuous for all points 
in the plane. Fy y(x, y) = 1 for all x = 1 and y = 1, which implies that X and Y each assume 
values less than or equal to one. 
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FIGURE 5.11 
Joint cdf for two uniform random variables. 


The marginal cdf of X is: 
0 x<0 
Fy(x) = Fy y(x, 0) = x OsSsxesl 
1 T, 


X is uniformly distributed in the unit interval. 


Example 5.12 
The joint cdf for the vector of random variable X = (X, Y) is given by 


(l-e™)(1-e%) x=0,y=0 


Fy y(x, y) = T elsewhere. 


Find the marginal cdf’s. 
The marginal cdf’s are obtained by letting one of the variables approach infinity: 


Fy(x) = lim Fyy(x, y) 51- e ™ x20 
yoo 
Fy(y) = lim Fyy(x,y) =1-e° y=0. 


X and Y individually have exponential distributions with parameters a and £, respectively. 
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Example 5.13 


Find the probability of the events A = {X =1,Y =1},B={X >x,Y > y}, where x > 0 
and y > 0,and D = {1 < X = 2,2 < Y = 5} in Example 5.12. 
The probability of A is given directly by the cdf: 


P[A] = P[X <1,Y <1] = Fyy(1,1) = (1 - e& (1 - e®?). 
The probability of B requires more work. By DeMorgan’s rule: 
Bo=({X > x} N{Y > ys) = {X =x} U{Y = y}. 
Corollary 5 in Section 2.2 gives the probability of the union of two events: 


PB] = P[X = x] + P[Y = y] - P[X = x,Y =< y] 
= (1 = e) + (1 = e®) = (1 = e™)(1 - e) 


=1-e%e?, 


Finally we obtain the probability of B: 
P[B] = 1 — P[B] = ee». 


You should sketch the region B on the plane and identify the events involved in the calculation 
of the probability of B°. 
The probability of event D is found by applying property (vi) of the joint cdf: 


Pil< XxX 52,2<Ys5] 
= Fyy(2,5) — Fxy(2,2) — Fxy(1,5) + Fxy(1,2) 
= (1 = e (1 — e) — (1 - e) (1 — e?) 
(1 = e*)(1 — e) + (1 — e& *\(1 -— e”). 


5.3.1 Random Variables That Differ in Type 


In some problems it is necessary to work with joint random variables that differ in 
type, that is, one is discrete and the other is continuous. Usually it is rather clumsy to 
work with the joint cdf, and so it is preferable to work with either P[X = k, Y = y] or 
P[|X =k, y, < Y < y)]. These probabilities are sufficient to compute the joint cdf 
should we have to. 


Example 5.14 Communication Channel with Discrete Input and Continuous Output 


The input X to a communication channel is +1 volt or —1 volt with equal probability. The output 
Y of the channel is the input plus a noise voltage N that is uniformly distributed in the interval 
from —2 volts to +2 volts. Find P[ X = +1,Y < 0]. 

This problem lends itself to the use of conditional probability: 


P(X = +1,Y < y] = P[Y =< y|X = 41]P[X = +1], 
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where P[X = +1] = 1/2. When the input X = 1, the output Y is uniformly distributed in the 
interval [—1, 3]; therefore 
y+1 


P[Y < y| X = +1] = 4 for-1 =< y = 3. 


Thus P[X = +1,Y = 0] = P[Y =< 0| X = +1]P[X = +1] = (1/2)(1/4) = 1/8. 


THE JOINT PDF OF TWO CONTINUOUS RANDOM VARIABLES 


The joint cdf allows us to compute the probability of events that correspond to “rectangu- 
lar” shapes in the plane. To compute the probability of events corresponding to regions 
other than rectangles, we note that any reasonable shape (i.e., disk, polygon, or half-plane) 
can be approximated by the union of disjoint infinitesimal rectangles, B;,. For example, 
Fig. 5.12 shows how the events A = {X + Y < 1} and B = {X° + X? < 1} are 
approximated by rectangles of infinitesimal width. The probability of such events can 
therefore be approximated by the sum of the probabilities of infinitesimal rectangles, and 
if the cdf is sufficiently smooth, the probability of each rectangle can be expressed in 
terms of a density function: 


P[B] ~ > 2 PIB] = DD fey(xj, ve) Axdy. 
J 


(x; Yk)EB 


As Ax and Ay approach zero, the above equation becomes an integral of a probability 
density function over the region B. 

We say that the random variables X and Y are jointly continuous if the probabil- 
ities of events involving (X, Y) can be expressed as an integral of a probability density 
function. In other words, there is a nonnegative function fy y(x, y), called the joint 


> X 


FIGURE 5.12 
Some two-dimensional non-product form events. 
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FIGURE 5.13 
The probability of A is the integral of fy y(x, y) over the region 
defined by A. 


probability density function, that is defined on the real plane such that for every event 
B,a subset of the plane, 


P(X in B] = | Jier) dx' dy’, (5.11) 


as shown in Fig. 5.13. Note the similarity to Eq. (5.5) for discrete random variables. 
When B is the entire plane, the integral must equal one: 


i- f J fxxy(x', y’) dx’ dy’. (5.12) 


Equations (5.11) and (5.12) again suggest that the probability “mass” of an event is 
found by integrating the density of probability mass over the region corresponding to 
the event. 

The joint cdf can be obtained in terms of the joint pdf of jointly continuous ran- 
dom variables by integrating over the semi-infinite rectangle defined by (x, y): 


x py 
Fery) = f f fase. y ax dy’. (5.13) 


It then follows that if X and Y are jointly continuous random variables, then the pdf 
can be obtained from the cdf by differentiation: 


a Fy y(x, y) 


fxy(x, y) = =a (5.14) 
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Note that if X and Y are not jointly continuous, then it is possible that the above partial 
derivative does not exist. In particular, if the Fy y(x, y) is discontinuous or if its partial de- 
rivatives are discontinuous, then the joint pdf as defined by Eq. (5.14) will not exist. 
The probability of a rectangular region is obtained by letting B = {(x, y): a4 < x = 
bı and ay < y S by} in Eq. (5.11): 
bi pb 

Pla, < X = b,a < Y = b| = f fxy(x', y')dx' dy’. (5.15) 
a Ja 


2 


It then follows that the probability of an infinitesimal rectangle is the product of the 
pdf and the area of the rectangle: 


x+dx pytdy 
P[x< X Sx+dx,y<Y S y+ dy] = / J fxy(x', y')dx' dy’ 
x y 


= fy y(x, y) dx dy. (5.16) 


Equation (5.16) can be interpreted as stating that the joint pdf specifies the probability 
of the product-form events 


{x< X sx+dx}N{y<Y < y+ dy}. 


The marginal pdf’s fy(x) and fy(y) are obtained by taking the derivative of the 
corresponding marginal cdf’s, Fy(x) = Fy y(x,œ) and Fy(y) = Fy y(%œ, y). Thus 


d Ff [> 
h) = 4 K [tere ay) aa’ 


-f fxx(x,y') dy’. (5.17a) 
Similarly, 


TE J fax, y) de (5.170) 


Thus the marginal pdf’s are obtained by integrating out the variables that are not of 
interest. 

Note that fy(x) dx ~ P[x < X =< x + dx, Y < œ] is the probability of the 
infinitesimal strip shown in Fig. 5.14(a). This reminds us of the interpretation of 
the marginal pmf’s as the probabilities of columns and rows in the case of discrete 
random variables. It is not surprising then that Eqs. (5.17a) and (5.17b) for the 
marginal pdf’s and Eqs. (5.7a) and (5.7b) for the marginal pmf’s are identical 
except for the fact that one contains an integral and the other a summation. As in 
the case of pmf’s, we note that, in general, the joint pdf cannot be obtained from 
the marginal pdf’s. 
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y j Ya 


x! x + dx 


fxŒx)dx = P[x < X <x + dx, Y < œ] fro)dy = PIX < %,y < Y<y + dy] 
(a) (b) 


FIGURE 5.14 
Interpretation of marginal pdf's. 


Example 5.15 Jointly Uniform Random Variables 


A randomly selected point (X, Y) in the unit square has the uniform joint pdf given by 


1 O0=xstland0O=y=1 
fxx(x, y) = 0 


elsewhere. 


The scattergram in Fig. 5.3(a) corresponds to this pair of random variables. Find the joint cdf of 
X and Y. 

The cdf is found by evaluating Eq. (5.13). You must be careful with the limits of the integral: 
The limits should define the region consisting of the intersection of the semi-infinite rectangle 
defined by (x, y) and the region where the pdf is nonzero. There are five cases in this problem, cor- 
responding to the five regions shown in Fig. 5.15. 


1. Ifx< Oor y < 0, the pdf is zero and Eq. (5.14) implies 
Fy y(x, y) = 0. 


2. If (x, y) is inside the unit interval, 


xX py 
Fy y(x, y) = f I 1dx' dy' = xy. 


3. If0<x<1landy>1, 


x 1 
Fyy(x, y) = Í fia dy' =x. 


4. Similarly, if x > 1 and0 = y = 1, 


Fy y(x, y) = y. 
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Regions that need to be considered separately in computing cdf 
in Example 5.15. 


5. Finally,ifx > land y > 1, 


1 pl 
Fy y(x, y) = i | 1dx'dy' = 1. 
o Jo 


We see that this is the joint cdf of Example 5.11. 


Example 5.16 
Find the normalization constant c and the marginal pdf’s for the following joint pdf: 


cee? OS ySx<@w 


fry (x, y) = fe 


elsewhere. 


The pdf is nonzero in the shaded region shown in Fig. 5.16(a). The constant c is found from 
the normalization condition specified by Eq. (5.12): 


1= f Í ce *e dy dx = 1 ce *(1—e*)dx = is 
o Jo 0 2 
Therefore c = 2. The marginal pdf’s are found by evaluating Eqs. (5.17a) and (5.17b): 
fats) = | farts) dy= | aedy =- Osx < 00 

0 0 

and 
frly) = | fxy(x, y) dx = f gI Gayaa 
0 y 


You should fill in the steps in the evaluation of the integrals as well as verify that the marginal 
pdf’s integrate to 1. 
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(a) (b) 


FIGURE 5.16 
The random variables X and Y in Examples 5.16 and 5.17 have a pdf that is nonzero only in the shaded 
region shown in part (a). 


Example 5.17 


Find P[X + Y < 1] in Example 5.16. 

Figure 5.16(b) shows the intersection of the event {X + Y < 1} and the region where the 
pdf is nonzero. We obtain the probability of the event by “adding” (actually integrating) infini- 
tesimal rectangles of width dy as indicated in the figure: 


5 pl-y 5 
P[X+Y<1)= Í f 2e *e ¥ dx dy = I 2e[e — e=) dy 
0 y 0 


=1-2e1. 


Example 5.18 Jointly Gaussian Random Variables 


The joint pdf of X and Y, shown in Fig. 5.17, is 


1 
fxxy(x, y) = mvi- p 


We say that X and Y are jointly Gaussian.! Find the marginal pdf’s. 
The marginal pdf of X is found by integrating fy y(x, y) over y: 


oP tprytyAI-p?) oo < x,y < 00, (5.18) 


e! 1—°) oo 
WaV1 — pJ% 


'This is an important special case of jointly Gaussian random variables. The general case is discussed in Section 5.9. 


e(—2pxy)/2(1—p’) dy. 


fx(x) 
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Fy yoy) 


0.4 p 


0.3 p 


0.2 F 


FIGURE 5.17 
Joint pdf of two jointly Gaussian random variables. 


We complete the square of the argument of the exponent by adding and subtracting p°x?, that is, 


y? — 2pxy + px? — px? = (y — px)? — px’. Therefore 


e 8 2(1-p’) o 
2mV1 — p J-% 
e% [9 o-(y—px)72(1-p") 
V2r J- V2r(1 — p°) 


2 
ex 


Var 


where we have noted that the last integral equals one since its integrand is a Gaussian pdf with 
mean px and variance 1 — p*. The marginal pdf of X is therefore a one-dimensional Gaussian 
pdf with mean 0 and variance 1. From the symmetry of fy y(x, y) in x and y, we conclude that the 
marginal pdf of Y is also a one-dimensional Gaussian pdf with zero mean and unit variance. 


a ex)’ Y21-) gy 


fx(x) 


dy 


INDEPENDENCE OF TWO RANDOM VARIABLES 


X and Y are independent random variables if any event A, defined in terms of X is in- 
dependent of any event A, defined in terms of Y; that is, 


P[X in Ay, Y in Ay] = P[X in A,]P[Y in 43]. (5.19) 


In this section we present a simple set of conditions for determining when X and Y are 
independent. 

Suppose that X and Y are a pair of discrete random variables, and suppose we 
are interested in the probability of the event A = A; N A3, where A, involves only 
X and A, involves only Y. In particular, if X and Y are independent, then A, and 
A, are independent events. If we let Ay = {X = x;} and Az = {Y = yx}, then the 
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independence of X and Y implies that 


Px,y(Xjs Yk) = P[X = xj, Y = yx] 
= P[X = x]P[Y = yx] 
= px(x;) Py(ye) for all x; and yx. (5.20) 
Therefore, if X and Y are independent discrete random variables, then the joint pmf is 
equal to the product of the marginal pmfs. 


Now suppose that we don’t know if X and Y are independent, but we do know that 
the pmf satisfies Eq. (5.20). Let A = A, N A, be a product-form event as above, then 


P(A] = > ` Px y(Xj, Yk) 


xjin A yg in Az 


> 5 Px(X;) Py (Ye) 


x; in Ay yk in Az 


> Px(x;) ` Py(Y«) 


x; in Ay y, in Ay 


P[A,|P[ A2], (5.21) 


which implies that A; and A, are independent events. Therefore, if the joint pmf of X 
and Y equals the product of the marginal pmf’s, then X and Y are independent. We have 
just proved that the statement “X and Y are independent” is equivalent to the state- 
ment “the joint pmf is equal to the product of the marginal pmf’s.” In mathematical 
language, we say, the “discrete random variables X and Y are independent if and only if 
the joint pmf is equal to the product of the marginal pmf’s for all x;, yg.” 


Example 5.19 


Is the pmf in Example 5.6 consistent with an experiment that consists of the independent tosses 
of two fair dice? 

The probability of each face in a toss of a fair die is 1/6. If two fair dice are tossed and if the 
tosses are independent, then the probability of any pair of faces, say j and k, is: 


1 

36° 

Thus all possible pairs of outcomes should be equiprobable. This is not the case for the joint pmf 
given in Example 5.6. Therefore the tosses in Example 5.6 are not independent. 


P[X = j,Y =k] = P[X = j]P[Y = k] 


Example 5.20 


Are Q and R in Example 5.9 independent? From Example 5.9 we have 
Wema k B) 
P[Q = q]P[R = r] = (1 - p” )( VIT pP 
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=PiQ=q¢,R=r] for all q = 0,1,... 
r=0,...,M-—1. 


Therefore Q and R are independent. 


In general, it can be shown that the random variables X and Y are independent if 
and only if their joint cdf is equal to the product of its marginal cdf’s: 


Fy y(x, y) = Fy(x)Fy(y) for all x and y. (5.22) 


Similarly, if X and Y are jointly continuous, then X and Y are independent if and 
only if their joint pdf is equal to the product of the marginal pdf's: 


fxy(x, y) = fx(x)fy(y) for all x and y. (5.23) 


Equation (5.23) is obtained from Eq. (5.22) by differentiation. Conversely, Eq. (5.22) is 
obtained from Eq. (5.23) by integration. 


Example 5.21 


Are the random variables X and Y in Example 5.16 independent? 

Note that fx(x) and fy(y) are nonzero for all x > 0 and all y > 0. Hence fx(x)fy(y) is 
nonzero in the entire positive quadrant. However fy y(x, y) is nonzero only in the region y < x 
inside the positive quadrant. Hence Eq. (5.23) does not hold for all x, y and the random variables 
are not independent. You should note that in this example the joint pdf appears to factor, but 
nevertheless it is not the product of the marginal pdfs. 


Example 5.22 


Are the random variables X and Y in Example 5.18 independent? The product of the marginal 
pdf’s of X and Y in Example 5.18 is 


1 
fx(*) fry) = ke -0 < x,y < œ. 
T 


By comparing to Eq. (5.18) we see that the product of the marginals is equal to the joint pdf if 
and only if p = 0. Therefore the jointly Gaussian random variables X and Y are independent if 
and only if p = 0. We see in a later section that p is the correlation coefficient between X and Y. 


Example 5.23 


Are the random variables X and Y independent in Example 5.12? If we multiply the marginal 
cdf’s found in Example 5.12 we find 


Fy(x)Fy(y) = (1 — &™)(1 — e®) = Fyy(x, y) for all x and y. 
Therefore Eq. (5.22) is satisfied so X and Y are independent. 


If X and Y are independent random variables, then the random variables defined 
by any pair of functions g(X) and h(Y) are also independent. To show this, consider the 


5.6 


5.6.1 
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one-dimensional events A and B. Let A’ be the set of all values of x such that if x is in 
A' then g(x) is in A, and let B’ be the set of all values of y such that if y is in B’ then 
h(y) is in B. (In Chapter 3 we called A’ and B’ the equivalent events of A and B.) Then 


P(g(X) in A, h(Y) in B] = P[X in A’, Y in B’] 
= P| X in A’ |P[Y in B’] 
= P[g(X) in A]P[A(Y) in B]. (5.24) 
The first and third equalities follow from the fact that A and A’ and B and B’ are 


equivalent events. The second equality follows from the independence of X and Y. 
Thus g(X) and h(Y) are independent random variables. 


JOINT MOMENTS AND EXPECTED VALUES OF A FUNCTION OF TWO RANDOM 
VARIABLES 


The expected value of X identifies the center of mass of the distribution of X. The 
variance, which is defined as the expected value of (X — m)’, provides a measure of 
the spread of the distribution. In the case of two random variables we are interested 
in how X and Y vary together. In particular, we are interested in whether the varia- 
tion of X and Y are correlated. For example, if X increases does Y tend to increase or 
to decrease? The joint moments of X and Y, which are defined as expected values of 
functions of X and Y, provide this information. 


Expected Value of a Function of Two Random Variables 


The problem of finding the expected value of a function of two or more random vari- 
ables is similar to that of finding the expected value of a function of a single random 
variable. It can be shown that the expected value of Z = g(X, Y) can be found using 
the following expressions: 


I J g(x, y)fx y(x, y) dx dy X,Y jointly continuous 


E[Z] (5.25) 


= g(x Yn) Pxy(Xis Yn) X,Y discrete. 
$ n 


Example 5.24 Sum of Random Variables 


Let Z = X + Y. Find E[Z]. 


E[Z] = E[X + Y] 


El 
cee + y')fxy(x', y') dx’ dy’ 


f 7 x'fyy(x', y’) dy’ dx! + i: / y' fxy(x', y’) dx' dy' 


= | xfx )dx + f y)dy = ELX] + ELY] (5.26) 
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Thus the expected value of the sum of two random variables is equal to the sum of the individual 
expected values. Note that X and Y need not be independent. 


The result in Example 5.24 and a simple induction argument show that the ex- 
pected value of a sum of n random variables is equal to the sum of the expected values: 


El Xp Xp +o + X,] = ELM] + + E[X,]. (5.27) 


Note that the random variables do not have to be independent. 


Example 5.25 Product of Functions of Independent Random Variables 


Suppose that X and Y are independent random variables, and let g(X, Y) = g,(X)g(Y). Find 
Elg(X,Y)] = Elgi(X)e(Y)].- 


ERORO f f aO) a ay 


Š l / GEWG aM l OO) ay) 


= E[gi(X)JE[g2(¥)]. 


5.6.2 Joint Moments, Correlation, and Covariance 


The joint moments of two random variables X and Y summarize information about 
their joint behavior. The jkth joint moment of X and Y is defined by 


J J xly* fy y(x, y) dx dy X,Y jointly continuous 
E[Xiy*] = ¢ me (5.28) 
> S xiykpxy(xi Yn) X,Y discrete. 

n 


i 


If j = 0, we obtain the moments of Y, and if k = 0, we obtain the moments of X. In 
electrical engineering, it is customary to call the j = 1k = 1 moment, E[XY], the 
correlation of X and Y. If E| XY] = 0, then we say that X and Y are orthogonal. 

The jkth central moment of X and Y is defined as the joint moment of the cen- 
tered random variables, X — E[ X] and Y — E[Y]: 


E[(X - E[X])(¥ - E[Y])*]. 


Note that j = 2 k = 0 gives VAR(X) and j = 0 k = 2 gives VAR(Y). 
The covariance of X and Y is defined as the j = k = 1 central moment: 


COV(X, Y) = E|(X — ELX])(Y — ELY])}. (5.29) 
The following form for COV(X, Y) is sometimes more convenient to work with: 


COV(X,Y) = E[XY — XE[Y] — YE[X] + E[X]E[Y]] 


Section 5.6 Joint Moments and Expected Values of a Function of Two Random Variables 259 


= E[XY] - 2E[X]E[Y] + E[X]E[Y] 
= E[XY] - E[X]E[Y]. (5.30) 


Note that COV(X, Y) = EL XY] if either of the random variables has mean zero. 


Example 5.26 Covariance of Independent Random Variables 
Let X and Y be independent random variables. Find their covariance. 
COV(X, Y) = E[(X — ELX])(Y - E[Y])] 
= E[X — E[X]JE[Y — E[Y]] 
= 0, 


where the second equality follows from the fact that X and Y are independent, and the third 
equality follows from E[X — E[X]] = E[X] — E[X] = 0. Therefore pairs of independent 
random variables have covariance zero. 


Let’s see how the covariance measures the correlation between X and Y. The covari- 
ance measures the deviation from my = E[X] and my = E[Y]. If a positive value of 
(X — my) tends to be accompanied by a positive values of (Y — my), and negative 
(X — my) tend to be accompanied by negative (Y — my); then (X — my)(Y — my) 
will tend to be a positive value, and its expected value, COV(X, Y), will be positive. This is 
the case for the scattergram in Fig. 5.3(d) where the observed points tend to cluster along a 
line of positive slope. On the other hand, if (X — my) and (Y — my) tend to have oppo- 
site signs, then COV(X, Y) will be negative. A scattergram for this case would have obser- 
vation points cluster along a line of negative slope. Finally if (X — my) and (Y — my) 
sometimes have the same sign and sometimes have opposite signs, then COV(X, Y) will be 
close to zero. The three scattergrams in Figs. 5.3(a), (b), and (c) fall into this category. 

Multiplying either X or Y by a large number will increase the covariance, so we 
need to normalize the covariance to measure the correlation in an absolute scale. The 
correlation coefficient of X and Y is defined by 


COV(X,Y) E[XY] - E[X]E[Y 
ee (X.Y) _ E[XY] EBLE (531) 


OxOy OxOy 


where oy = V VAR(X) and cy = V VAR(Y) are the standard deviations of X and 
Y, respectively. 
The correlation coefficient is a number that is at most 1 in magnitude: 


To show Eq. (5.32), we begin with an inequality that results from the fact that the 
expected value of the square of a random variable is nonnegative: 


PE e\(* — E[X] R Y- amy 
Ox Oy 
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=1+ 2pxy +1 
2(1 + pxy). 


The last equation implies Eq. (5.32). 

The extreme values of py y are achieved when X and Y are related linearly, 
Y =aX + b; pyy = lifa > Oand pyy = —1 if a < 0. In Section 6.5 we show that 
px.y can be viewed as a statistical measure of the extent to which Y can be predicted by 
a linear function of X. 

X and Y are said to be uncorrelated if py y = 0. If X and Y are independent, then 
COV(X, Y) = 0, so py y = 0. Thus if X and Y are independent, then X and Y are un- 
correlated. In Example 5.22, we saw that if X and Y are jointly Gaussian and px y = 0, 
then X and Y are independent random variables. Example 5.27 shows that this is not al- 
ways true for non-Gaussian random variables: It is possible for X and Y to be uncorre- 
lated but not independent. 


Example 5.27 Uncorrelated but Dependent Random Variables 
Let © be uniformly distributed in the interval (0, 277). Let 
X = cos 0 and Y = sin O. 


The point (X, Y) then corresponds to the point on the unit circle specified by the angle ©, as shown 
in Fig. 5.18. In Example 4.36, we saw that the marginal pdf’s of X and Y are arcsine pdf’s, which are 
nonzero in the interval (—1, 1). The product of the marginals is nonzero in the square defined by 
-1 =x < land -1 =< y = 1, so if X and Y were independent the point (X, Y) would assume all 
values in this square. This is not the case, so X and Y are dependent. 

We now show that X and Y are uncorrelated: 


E[XY] 


1 Qa 
E[sin © cos 0] a) sin ¢ cos ọ do 
0 


T 


1 Qa 
ea sin 26 dd = 0. 
4a 0 


Since E[ X] = E[Y] = 0, Eq. (5.30) then implies that X and Y are uncorrelated. 


Example 5.28 


Let X and Y be the random variables discussed in Example 5.16. Find E[XY], COV(X, Y), and 


PXY- 

Equations (5.30) and (5.31) require that we find the mean, variance, and correlation of 
X and Y. From the marginal pdf’s of X and Y obtained in Example 5.16, we find that 
E[ X] = 3/2 and VAR[ X] = 5/4, and that E[Y] = 1/2 and VAR[Y] = 1/4. The correlation of 


X and Y is 
E[ XY] = J | xy2e*e dy dx 
o Jo 


= T 2xe™(1 — e™ — xe*) dx = 1. 
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ya 


4 


(cos ‘6, sin 0) 
1 


FIGURE 5.18 
(X, Y) is a point selected at random on the unit circle. X and Y 
are uncorrelated but not independent. 


Thus the correlation coefficient is given by 


31 
1 — 
> 22 _ 1 
PXY 5 T V5 
4V 4 
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Many random variables of practical interest are not independent: The output Y of a com- 
munication channel must depend on the input X in order to convey information; consec- 
utive samples of a waveform that varies slowly are likely to be close in value and hence 
are not independent. In this section we are interested in computing the probability of 
events concerning the random variable Y given that we know X = x. We are also inter- 
ested in the expected value of Y given X = x. We show that the notions of conditional 
probability and conditional expectation are extremely useful tools in solving problems, 


even in situations where we are only concerned with one of the random variables. 


5.7.1 Conditional Probability 


The definition of conditional probability in Section 2.4 allows us to compute the prob- 


ability that Y is in A given that we know that X = x: 


P[Y in A, X = x] 
PLX = x] 


P[Y in A| X = x] = for P[X = x] > 0. 


(5.33) 
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Case 1: X Is a Discrete Random Variable 
For X and Y discrete random variables, the conditional pmf of Y given X = x is defined by: 


PS eS ol Pxy(%, y) 
PLX = x] Px(x) 


py(y|x) = P[Y = y|X = x] 


(5.34) 


for x such that P[ X = x] > 0. We define py(y|x) = 0 for x such that P[X = x] = 0. 
Note that py(y |x) is a function of y over the real line, and that py(y |x) > 0 only for 
y ina discrete set {y,, y,...}. 

The conditional pmf satisfies all the properties of a pmf, that is, it assigns non- 
negative values to every y and these values add to 1. Note from Eq. (5.34) that 
Py(y|x,) is simply the cross section of py y(x,,y) along the X = x, column in Fig. 5.6, 
but normalized by the probability py(x;,). 

The probability of an event A given X = x, is found by adding the pmf values of 
the outcomes in A: 


P[Y in A| X = x] = Py(yl x). (5.35) 
yjin A 
If X and Y are independent, then using Eq (5.20) 


P| X = x, Y = yj] 
P[X = xx] 


Py(yjl Xx) = = PLY = yj] = py(y;). (5.36) 
In other words, knowledge that X = x, does not affect the probability of events A 
involving Y. 

Equation (5.34) implies that the joint pmf py y(x, y) can be expressed as the 
product of a conditional pmf and a marginal pmf: 


Pxy(Xk, yj) = Py(yj| Xx) px(Xx) and Pxy(Xx, yj) = px(Xx | yj) Py (Yj): (5.37) 


This expression is very useful when we can view the pair (X, Y) as being generated sequen- 
tially, e.g., first X, and then Y given X = x. We find the probability that Y is in A as follows: 


PLY in A] = 5 5 Pxy(*k, Yj) 


all x, yjin A 


= ` > Py(¥j| Xx) Px (Xx) 


all x, yjin A 


> px(xx) > py(y;| Xx) 
yin A 


all x, 


S PLY in Al X = xx] px(xx)- (5.38) 


all x, 


Equation (5.38) is simply a restatement of the theorem on total probability discussed 
in Chapter 2. In other words, to compute P[Y in A] we can first compute 
P[Y in A| X = x] and then “average” over X,. 
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Example 5.29 Loaded Dice 


Find py(y|5) in the loaded dice experiment considered in Examples 5.6 and 5.8. 
In Example 5.8 we found that py(5) = 1/6. Therefore: 


3; 
py Sy ad ops ora 
Px(5) 


py(1|5) > py(2|5) = py(3|5) = py(4|5) = py(6|5) = 1/7. 


Clearly this die is loaded. 


Example 5.30 Number of Defects in a Region; Random Splitting of Poisson Counts 


The total number of defects X on a chip is a Poisson random variable with mean a. Each defect 
has a probability p of falling in a specific region R and the location of each defect is independent 
of the locations of other defects. Find the pmf of the number of defects Y that fall in the region R. 

We can imagine performing a Bernoulli trial each time a defect occurs with a “success” 
occurring when the defect falls in the region R. If the total number of defects is X = k, then Y 
is a binomial random variable with parameters k and p: 


0 j>k 
py(j|lk) = @e -pi OS]SK. 
j 


From Eq. (5.38) and noting that k = j, we have 


ee ere A O k EN 
Py(i) = È prli lk)px(k) = ÈT pe pyre 
(apye (io pay 
J! È (k — j)! 
_ (apye © iee (ap)! a 
J! J! 


Thus Y is a Poisson random variable with mean ap. 


Suppose Y is a continuous random variable. Eq. (5.33) can be used to define the 
conditional cdf of Y given X = xy: 


PLY = y, X = x] 
PIX =x)” 


Fy(y| xx) = for P[ X = x] > 0. (5.39) 


It is easy to show that Fy(y|x;) satisfies all the properties of a cdf. The conditional pdf 
of Y given X = xp, if the derivative exists, is given by 


AGA = J Fer) (5.40) 
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If X and Y are independent, PLY = y, X = X,] = P[Y = y|P[X = X,]so Fy(y|x) = 
Fy(y) and fy(y|x) = fy(y). The probability of event A given X = x, is obtained by 
integrating the conditional pdf: 


P[Y in A| X = x] = fy(yl xk) dy. (5.41) 
yinA 


We obtain P[Y in A] using Eq. (5.38). 


Example 5.31 Binary Communications System 


The input X to a communication channel assumes the values +1 or —1 with probabilities 1/3 and 
2/3. The output Y of the channel is given by Y = X + N, where N is a zero-mean, unit variance 
Gaussian random variable. Find the conditional pdf of Y given X = +1, and given X = —1. 
Find P[X = +1|Y > 0}. 

The conditional cdf of Y given X = +1 is: 


Fy(y| +1) = P[Y = y| X = +1] = P[N += y] 


y-l 
1 2 
= PN s=sy-1]= fl e*? dx 
loo «6M 2r 


where we noted that if X = +1, then Y = N + 1 and Y depends only on N. Thus, if X = +1, 
then Y is a Gaussian random variable with mean 1 and unit variance. Similarly, if X = —1, then 
Y is Gaussian with mean —1 and unit variance. 

The probabilities that Y > 0 given X = +1 and X = -1 is: 


SA 


e 
0 V2r 


a 2 ie le sag 
P[Y > 0|X =-1] = | ee? dx = I e™ dt = Q(1) = 0.159. 
0 V2rm 1 Vrm 
Applying Eq. (5.38), we obtain: 


P[Y > 0|X = +1] = “2 dx ef? dt = 1 — Q(1) = 0.841. 


ea 


1 2 
P[Y > 0] = P[Y > 0| X = +1]3 + P[Y >0|X = aiz = 0.386. 
From Bayes’ theorem we find: 
P[Y > 0| X = +1]P[X = +1] (1 - Q(1))⁄3 


P[X = +1|Y > 0] = AY > 0] ROMS 0.726. 


We conclude that if Y > 0, then X = +1 is more likely than X = —1. Therefore the receiver 
should decide that the input is X = +1 when it observes Y > 0. 


In the previous example, we made an interesting step that is worth elaborating on 
because it comes up quite frequently: P[Y < y| X = +1] = P[N + 1 < y], where 
Y = X + N. Let’s take a closer look: 
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PUX+N¥<20{X=x}]) Pix tN <z}N{X=x}] 
RSet l= P[X =x] = P[X =x] 


=Pix+N<2|X =x) =PIN<z-x|X =x]. 


In the first line, the events {X + N = z} and {x + N = z} are quite different. The 
first involves the two random variables X and N, whereas the second only involves N 
and consequently is much simpler. We can then apply an expression such as Eq. (5.38) 
to obtain P[Y = z]. The step we made in the example, however, is even more interest- 
ing. Since X and N are independent random variables, we can take the expression one 
step further: 


P[Y sz|X =x) = PĪN =£ z - x| X = x] = P[N =s z - x]. 


The independence of X and N allows us to dispense with the conditioning on x alto- 
gether! 


Case 2: X Is a Continuous Random Variable 


If X is a continuous random variable, then P| X = x] = 0 so Eq. (5.33) is undefined 
for all x. If X and Y have a joint pdf that is continuous and nonzero over some region 
of the plane, we define the conditional cdf of Y given X = x by the following limiting 
procedure: 


Fy(y|x) = lim Fy(ylx < X = x + h). (5.42) 


The conditional cdf on the right side of Eq. (5.42) is: 


PLY sy,x<X Sxt+h] 
Plx< xX Sxt+h] 


y xth y 
J / Heaney i Foy alk 


Fy(ylx< X sx+hj= 


5.43 
a Fah ae 
fx(x') dx 
As we let h approach zero, Eqs. (5.42) and (5.43) imply that 
y 
J fxy(x, y') dy’ 
Fy(y|x) = = . 5.44 
The conditional pdf of Y given X = x is then: 
d fxx(x, y) 
ROY Ss A a) == (5.45) 


dy fx(x) 
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fxv) + 


Feyxsydxdy 


AQlxndy > Fade 


FIGURE 5.19 
Interpretation of conditional pdf. 


It is easy to show that fy(y | x) satisfies the properties of a pdf. We can interpret fy(y|x) dy 
as the probability that Y is in the infinitesimal strip defined by (y, y + dy) given that X 
is in the infinitesimal strip defined by (x, x + dx), as shown in Fig. 5.19. 

The probability of event A given X = x is obtained as follows: 


P[Y in A| X = x] = fy(y|x) dy. (5.46) 


yinA 


There is a strong resemblance between Eq. (5.34) for the discrete case and Eq. (5.45) 
for the continuous case. Indeed many of the same properties hold. For example, we 
obtain the multiplication rule from Eq. (5.45): 


fxy(% y) = frly | x)fx(x) and fy y(x, y) = fx(x | y)fy(y)- (5.47) 


If X and Y are independent, then fy y(x, y) = fx(x)fy(y) and fr(y|x) = fy(y), 


fx(xly) = f(x), Fy(ylx) = Fy(y), and Fy(xly) = Fx(x). 
By combining Eqs. (5.46) and (5.47), we can show that: 


Co 


P[Y in A] = J P[Y in A| X = x]fx(x) dx. (5.48) 


You can think of Eq. (5.48) as the “continuous” version of the theorem on total probabili- 
ty. The following examples show the usefulness of the above results in calculating the 
probabilities of complicated events. 
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Example 5.32 


Let X and Y be the random variables in Example 5.8. Find fy(x|y) and fy(y |x). 
Using the marginal pdf’s obtained in Example 5.8, we have 


fx(ylx) = = et) forx = y 


forO< y< x. 


The conditional pdf of X is an exponential pdf shifted by y to the right. The conditional pdf of Y 
is an exponential pdf that has been truncated to the interval [0, x]. 


Example 5.33 Number of Arrivals During a Customer's Service Time 


The number N of customers that arrive at a service station during a time t is a Poisson random 
variable with parameter Bt. The time T required to service each customer is an exponential ran- 
dom variable with parameter a. Find the pmf for the number N that arrive during the service 
time T of a specific customer. Assume that the customer arrivals are independent of the 
customer service time. 

Equation (5.48) holds even if Y is a discrete random variable, thus 


k po 
_ oB k,-(a+B)t 
= k f te dt 
Letr = (a + B)t, then 
k o0 
PIN = k] = a if rke dr 
k!(æ + B) Jo 


(a ao = ( : alte : a) 


where we have used the fact that the last integral is a gamma function and is equal to k!. Thus N 
is a geometric random variable with probability of “success” a/(a@ + B). Each time a customer 
arrives we can imagine that a new Bernoulli trial begins where “success” occurs if the customer’s 
service time is completed before the next arrival. 


Example 5.34 


X is selected at random from the unit interval; Y is then selected at random from the inter- 
val(0, X). Find the cdf of Y. 
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When X = x, Y is uniformly distributed in (0, x) so the conditional cdf given X = x is 


wx OS ysx 


< = = 
BL Sa) f x< y. 


Equation (5.48) and the above conditional cdf yield: 


y ly 
= fax f -dx' = y — yln y. 
0 y x 


The corresponding pdf is obtained by taking the derivative of the cdf: 


fry)=-hny OS yell. 


Example 5.35 Maximum A Posteriori Receiver 


For the communications system in Example 5.31, find the probability that the input was X = +1 
given that the output of the channelis Y = y. 

This is a tricky version of Bayes’ rule. Condition on the event {y < Y = y + A} instead 
of {Y = y}: 


Ply<Y¥<y+A|X = +1]P[X = +1] 
Ply<Y<y+ A] 


P[X = +1|y<Y<y+4]= 


E fyl +1) 403) 
fr(yl +1) A(13) + fr(yl =) 423) 
1 


=e 0921/3) 
_ 2T 
Ea EE E 
QT 2T 
e172 1 


ODR 4 e OHP 1+ 262)" 
The above expression is equal to 1/2 when yr = 0.3466. For y > yr, X = +1 is more likely, and 


for y < yr, X = —1 is more likely. A receiver that selects the input X that is more likely given 
Y = yis called a maximum a posteriori receiver. 


Conditional Expectation 


The conditional expectation of Y given X = x is defined by 


EY |x] = J yila) dy. (5.49a) 
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In the special case where X and Y are both discrete random variables we have: 
ELY |x] = X yjpy(vj| xe). (5.49b) 
yi 


Clearly, E[Y | x] is simply the center of mass associated with the conditional pdf or pmf. 

The conditional expectation E[Y | x] can be viewed as defining a function of x: 
g(x) = E[Y |x]. It therefore makes sense to talk about the random variable g(X) = 
E[Y |X]. We can imagine that a random experiment is performed and a value for 
Xis obtained,say X = xy, and then the value g(xọ) = E[Y | xo] is produced. We are in- 
terested in E[g(X)] = E[E[Y | X]]. In particular, we now show that 


E[Y] = E[E[Y|X]], (5.50) 

where the right-hand side is 
E[E[Y|X]] = [2 anc dx  X continuous (5.51a) 
E[E[Y|X]] = SELY | xx] px(xe) X discrete. (5.51b) 


Xk 


We prove Eq. (5.50) for the case where X and Y are jointly continuous random 
variables, then 


ELE[Y XJ] = J BLY Lati(x) dx 


AF [ore | x) dy f(x) dx 


-f yf fx y(x, y) dx dy 


[of ay = 21. 


The above result also holds for the expected value of a function of Y: 
E[h(Y)] = ELE[h(Y) |X]. 
In particular, the kth moment of Y is given by 


E[Y*] = E[E[Y*|X]]. 


Example 5.36 Average Number of Defects in a Region 
Find the mean of Y in Example 5.30 using conditional expectation. 


E[Y] = DEY x = k]P[X = k] = DIA = k] = pE[ X] = pa. 
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The second equality uses the fact that E[Y | X = k] = kp since Y is binomial with para- 
meters k and p. Note that the second to the last equality holds for any pmf of X. The fact that X 
is Poisson with mean a is not used until the last equality. 


Example 5.37 Binary Communications Channel 


Find the mean of the output Y in the communications channel in Example 5.31. 
Since Y is a Gaussian random variable with mean +1 when X = +1, and —1 when 
X = —1, the conditional expected values of Y given X are: 


E[Y|+1]=1 and E[Y|-1]=-1. 


Equation (5.38b) implies 


E[Y] = SE[Y|X = k]P[X = k] = +1(1/3) — 1(2/3) = —1/3. 
k=0 
The mean is negative because the X = —1 inputs occur twice as often as X = +1. 


Example 5.38 Average Number of Arrivals in a Service Time 


Find the mean and variance of the number of customer arrivals N during the service time T of a 
specific customer in Example (5.33). 

N is a Poisson random variable with parameter Bt when T = t is given, so the first two 
conditional moments are: 


E[N|T =t]=8t  E[N*|T = t] = (Bt) + (Bt). 


The first two moments of N are obtained from Eq. (5.50): 


E[N] = I E(NIT = tVfp(t) dt = i Btfy(t) dt = BELT] 


E[N?] = EĻN?|T = tlfp(t) dt = f {Bt + B°2} fr(t) dt 
= BE[T] + B’E[T’]. 


The variance of N is then 
VAR[N] = E[N?] — (E[N])° 
= PET] + BELT] - BET]? 
= B VAR[T] + BE[T]. 


Note that if T is not random (i.e., E[T] = constant and VAR[7] = 0) then the mean and 
variance of N are those of a Poisson random variable with parameter BE[T]. When Tis random, 
the mean of N remains the same but the variance of N increases by the term B* VAR[T'], that is, 
the variability of T causes greater variability in N. Up to this point, we have intentionally avoid- 
ed using the fact that T has an exponential distribution to emphasize that the above results hold 


5.8 
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for any service time distribution f;(t). If T is exponential with parameter a, then E[T] = 1/a and 
VAR[T] = 1/0?, so 


B 
and VAR[N] =— + 
a 


R|D 


FUNCTIONS OF TWO RANDOM VARIABLES 


Quite often we are interested in one or more functions of the random variables associat- 
ed with some experiment. For example, if we make repeated measurements of the same 
random quantity, we might be interested in the maximum and minimum value in the set, 
as well as the sample mean and sample variance. In this section we present methods of 
determining the probabilities of events involving functions of two random variables. 


One Function of Two Random Variables 
Let the random variable Z be defined as a function of two random variables: 
Z = g(X,Y). (5.52) 


The cdf of Z is found by first finding the equivalent event of {Z = z}, that is, the set 
R, = {x = (x, y) such that g(x) = z}, then 


F(z) = P[X in R,] = f fxy(x', y')dx' dy’. (5.53) 
(x, y)ER 


The pdf of Z is then found by taking the derivative of F,(z). 


Example 5.39 Sum of Two Random Variables 


Let Z = X + Y. Find Fz(z) and f(z) in terms of the joint pdf of X and Y. 
The cdf of Z is found by integrating the joint pdf of X and Y over the region of the plane 
corresponding to the event {Z =< z}, as shown in Fig. 5.20. 


FIGURE 5.20 
PIZ =z] =P[X+YszZ]. 
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o0 z=% 
Fz(z) = / I fxx(x', y’) dy! dx’. 
The pdf of Z is 
d 1 F: F 
fz{z) = PPRA SS) = fxy(x ,Z— xX')dx'. (5.54) 


Thus the pdf for the sum of two random variables is given by a superposition integral. 
If X and Y are independent random variables, then by Eq. (5.23) the pdf is given by the 
convolution integral of the marginal pdf’s of X and Y: 


poe / _fxlx)flz = x) ae. (5.55) 


In Chapter 7 we show how transform methods are used to evaluate convolution integrals such as 
Eq. (5.55). 


Example 5.40 Sum of Nonindependent Gaussian Random Variables 


Find the pdf of the sum Z = X + Y of two zero-mean, unit-variance Gaussian random vari- 
ables with correlation coefficient p = —1/2. 

The joint pdf for this pair of random variables was given in Example 5.18. The pdf of Z is 
obtained by substituting the pdf for the joint Gaussian random variables into the superposition 
integral found in Example 5.39: 


fz(z) = ESE — x’) dx’ 
1 


io) 
—[x-2px'(z—x') + (z—x')?]/2(1-p”) dx' 
e x 
2n(1 a ae 
ior) 


; aar) ene? -x'z+.27)/2(3/4) dx’. 
T —00 


After completing the square of the argument in the exponent we obtain 


g 
e7’ 


Thus the sum of these two nonindependent Gaussian random variables is also a zero-mean, unit- 
variance Gaussian random variable. 


Example 5.41 A System with Standby Redundancy 


A system with standby redundancy has a single key component in operation and a duplicate of 
that component in standby mode. When the first component fails, the second component is put 
into operation. Find the pdf of the lifetime of the standby system if the components have inde- 
pendent exponentially distributed lifetimes with the same mean. 

Let T) and T, be the lifetimes of the two components, then the system lifetime is 
T = T, + Ty, and the pdf of T is given by Eq. (5.55). The terms in the integrand are 
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0 x > Zz. 


Note that the first equation sets the lower limit of integration to 0 and the second equation sets 
the upper limit to z. Equation (5.55) becomes 


Zz 
fr(z) = [arne dx 
0 


rds 
= Ver f dx = Xze™¥. 
0 


Thus T is an Erlang random variable with parameter m = 2. 


The conditional pdf can be used to find the pdf of a function of several random 
variables. Let Z = g(X, Y), and suppose we are given that Y = y, then Z = g(X, y) 
is a function of one random variable. Therefore we can use the methods developed in 
Section 4.5 for single random variables to find the pdf of Z given Y = y: fz(z|Y = y). 
The pdf of Z is then found from 


faa) = | fly) dy’ 


Example 5.42 


Let Z = X/Y. Find the pdf of Z if X and Y are independent and both exponentially distributed 
with mean one. 

Assume Y = y, then Z = X/y is simply a scaled version of X. Therefore from Example 
4.31 


felzly) = lylfx(yzly). 
The pdf of Z is therefore 


co Co 


fz(2) = [ly lao’ ho ay = J 1 ie" y')dy'. 


We now use the fact that X and Y are independent and exponentially distributed with mean one: 


fz(z) = f y'fx(y'z)fy(y') dy z>0 
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Transformations of Two Random Variables 


Let X and Y be random variables associated with some experiment, and let the random 
variables Z, and Z, be defined by two functions of X = (X, Y): 


Z, = &(X) and Z = &(X). 


We now consider the problem of finding the joint cdf and pdf of Z; and Z). 
The joint cdf of Z; and Z; at the point z = (z1, Z2) is equal to the probability of 
the region of x where g(x) = zx for k = 1,2: 


Fa (Z1> 22) = Plei(X) = z1, &(X) = z2). (5.56a) 


If X, Y have a joint pdf, then 


eae Z2) = f fxy(x', y’) dx' dy'. (5.56b) 


x": g(x’) =z, 


Example 5.43 
Let the random variables W and Z be defined by 
W = min(X, Y) and Z = max(X,Y). 


Find the joint cdf of W and Z in terms of the joint cdf of X and Y. 
Equation (5.56a) implies that 
Fy, z(w z) = Pi{min(X, Y) = w}M {max(X, Y) = z}]. 


The region corresponding to this event is shown in Fig. 5.21. From the figure it is clear that if 
z > w, the above probability is the probability of the semi-infinite rectangle defined by the 


FIGURE 5.21 
{min(X, Y) = w = {X = w} U{Y < w} and 
{max(X, Y) Sz = {X =z} N{Y sz}. 
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point (z, z) minus the square region denoted by A. Thus if z > w, 


Fy, z(w, z) = Fyy(z,z) — P[A] 
= Fy y(z, z) 
— {Fx y(z, z) — Fry(w,z) — Fxy(z, w) + Fry y(w, w)} 
= Fy y(w, z) + Fyy(z, w) — Fry(w, w). 
If z < w then 


Fw z(w, z) = Fy y(z, z). 


Example 5.44 Radius and Angle of Independent Gaussian Random Variables 


Let X and Y be zero-mean, unit-variance independent Gaussian random variables. Find the joint 
cdf and pdf of R and ©, the radius and angle of the point (X, Y): 


R=(X + Y°) © = tan” (Y/X). 
The joint cdf of R and © is: 
eet ye 
Fr (ro, 00) = P[R = 1m, 9 = b] = f ~z; ¥ dy 
(x; VIER (io, 6) 
where 


R(r, 6) = {(% y): Vx? + y? = 7,0 < tan (Y/X) = 0o}. 


The region R,,9, is the pie-shaped region in Fig. 5.22. We change variables from Cartesian to 
polar coordinates to obtain: 


To 9 6-712 
Fro (ro, 9%) = P[R = 1%, © = 0] = | r dr d0 
: 0 0 27 
9 —r?/2 
z a 0< <2r 0< <. (5.57) 
y 
ro 
Oo 
> X 
FIGURE 5.22 


Region of integration R,, ø, in Example 5.44. 
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R and © are independent random variables, where R has a Rayleigh distribution and © is 
uniformly distributed in (0, 277). The joint pdf is obtained by taking partial derivatives with 
respect to r and 06: 


a 0 


== —r’]/2 
ðrðð 27 ( en) 


trol’, 0) = 


= (re), 0<0<27 O0<r<o. 


This transformation maps every point in the plane from Cartesian coordinates to polar 
coordinates. We can also go backwards from polar to Cartesian coordinates. First we generate in- 
dependent Rayleigh R and uniform © random variables. We then transform R and © into Carte- 
sian coordinates to obtain an independent pair of zero-mean, unit-variance Gaussians. Neat! 


pdf of Linear Transformations 


The joint pdf of Z can be found directly in terms of the joint pdf of X by finding the 
equivalent events of infinitesimal rectangles. We consider the linear transformation of 
two random variables: 


V = aX + bY T Vj ja b|| X 
W = cX + eY W c e| Y| 


Denote the above matrix by A. We will assume that A has an inverse, that is, it has de- 
terminant |ae — bc| # 0, so each point (v, w) has a unique corresponding point (x, y) 


obtained from 
x v 
= A! . 5.58 
E l H ( 


Consider the infinitesimal rectangle shown in Fig. 5.23. The points in this rectangle are 
mapped into the parallelogram shown in the figure. The infinitesimal rectangle and the 
parallelogram are equivalent events, so their probabilities must be equal. Thus 


fxx(x, y)dx dy = fy,w(v, w) dP 
where dP is the area of the parallelogram. The joint pdf of V and W is thus given by 


fxy(x, y) 


dP 
dx dy 


fv,w(%, w) = , (5.59) 


where x and y are related to (v, w) by Eq. (5.58). Equation (5.59) states that the joint 
pdf of V and W at (v, w) is the pdf of X and Y at the corresponding point (x, y), but 
rescaled by the “stretch factor” dP/dx dy. It can be shown that dP = (\ae — bc|) dx dy, 
so the “stretch factor” is 


dP 
dx dy 


|ae — bc|(dx dy) 
= = — be) = |A 
ETEN, lae — be| = |Al, 
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Y4 Wh 
(v + adx + bdy, w + cdx + edy) 
(x, y + dy) (x + dx, y + dy) (w + bdy, w + edy) 
(v + adx, w + cdx) 
(x, y) (x + dx, y) w, w) 
> X = v 
v = ax + by 


w = cx + ey 
FIGURE 5.23 


Image of an infinitesimal rectangle under a linear transformation. 


where |A] is the determinant of A. 
The above result can be written compactly using matrix notation. Let the vector 
Z be 


Z = AX, 
where A is an n X n invertible matrix. The joint pdf of Z is then 


f(A 12). 


(5.60) 


Example 5.45 Linear Transformation of Jointly Gaussian Random Variables 


Let X and Y be the jointly Gaussian random variables introduced in Example 5.18. Let V and W 
be obtained from (X, Y) by 


w sal- ilele] 


Find the joint pdf of V and W. 
The determinant of the matrix is |A| = 1, and the inverse mapping is given by 


jasal ael 


soX=(V- wyv2 and Y = (V + W)/V2. Therefore the pdf of V and W is 


v- Ww 2+) 


fv.w(2, w) = Z V2 > V2 
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where 


1 2 2 2: 
fxy(x, y) = ep (8 2pxy ty i219"), 
2rV1 — pr 
By substituting for x and y, the argument of the exponent becomes 
(v — w)7/2 — 2p(v — w)(v + w)/2 + (v + w)?/2 
2(1 = p°) 


v? w? 


21+ p) 21- p) 


Thus 


1 


og {[072(1 +p) ]+[w 21-9) ]}, 
Qn(1 = P) 


fvw, w) = 


It can be seen that the transformed variables V and W are independent, zero-mean Gauss- 
ian random variables with variance 1 + p and 1 — p, respectively. Figure 5.24 shows contours of 
equal value of the joint pdf of (X, Y). It can be seen that the pdf has elliptical symmetry about 
the origin with principal axes at 45° with respect to the axes of the plane. In Section 5.9 we show 
that the above linear transformation corresponds to a rotation of the coordinate system so that 
the axes of the plane are aligned with the axes of the ellipse. 


5.9 PAIRS OF JOINTLY GAUSSIAN RANDOM VARIABLES 


The jointly Gaussian random variables appear in numerous applications in electrical 
engineering. They are frequently used to model signals in signal processing applications, 
and they are the most important model used in communication systems that involve 
dealing with signals in the presence of noise. They also play a central role in many sta- 
tistical methods. 

The random variables X and Y are said to be jointly Gaussian if their joint pdf 
has the form 


=1 x-m\ x— mı \[ y- m y-m\ 
aea ee aa 


fx, y(x, y)= 
2mo V 1 — Piy 


(5.61a) 
for =œ < x < wand-w<y< ow, 
The pdf is centered at the point (m, mz), and it has a bell shape that depends on 
the values of c1, o2, and py y as shown in Fig. 5.25. As shown in the figure, the pdf is 
constant for values x and y for which the argument of the exponent is constant: 


e 2 = = = 2 
(==) = 2x m)(2 ra) + (2 a) | = constant. (5.61b) 
O71 O71 02 02 
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FIGURE 5.24 
Contours of equal value of joint Gaussian pdf 
discussed in Example 5.45. 
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FIGURE 5.25 


Jointly Gaussian pdf (a) p = 0 (b) p =- 0.9. 


Figure 5.26 shows the orientation of these elliptical contours for various values of o1, 72, 
and py y. When py y = 0, that is, when X and Y are independent, the equal-pdf contour 
is an ellipse with principal axes aligned with the x- and y-axes. When py y # 0, the major 
axis of the ellipse is oriented along the angle [Edwards and Penney, pp. 570-571] 


2px yo1o2 
0 = } arctan! tn Zee ; (5.62) 
oO, ~ 92 


Note that the angle is 45° when the variances are equal. 
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FIGURE 5.26 
Orientation of contours of equal value of joint Gaussian pdf for px,y > 0. 


The marginal pdf of X is found by integrating fy y(x, y) over all y. The integra- 
tion is carried out by completing the square in the exponent as was done in Example 


5.18. The result is that the marginal pdf of X is 


eo (erm) Rat 


fx(x) = in 


that is, X is a Gaussian random variable with mean m; and variance o7. Similarly, the 


marginal pdf for Y is found to be Gaussian with pdf mean m, and variance o3. 


The conditional pdf’s fy(x| y) and fy(y| x) give us information about the inter- 


relation between X and Y. The conditional pdf of X given Y = yis 


fxy(% y) 
fr(y) 


exp x — Pry y = my mı 
2(1 — P y)oi 02 
V200%(1 — pry) ) 


fx(xly) = 
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Equation (5.64) shows that the conditional pdf of X given Y = y is also Gaussian but with 
conditional mean mı + py y(o4/o2)(y — m) and conditional variance oi(1 — pxy): 
Note that when py y = 0, the conditional pdf of X given Y = y equals the marginal pdf 
of X. This is consistent with the fact that X and Y are independent when py y = 0. On the 
other hand, as |p y| — 1 the variance of X about the conditional mean approaches zero, 
so the conditional pdf approaches a delta function at the conditional mean. Thus when 
| p xy! = 1, the conditional variance is zero and X is equal to the conditional mean with 
probability one. We note that similarly fy( y | x) is Gaussian with conditional mean m + px y 
(a /0,)(x — m,) and conditional variance o3(1—-pxy). 

We now show that the py y in Eq. (5.61a) is indeed the correlation coefficient 
between X and Y. The covariance between X and Y is defined by 


COV(X, Y) = E[(X — m)(¥ — m)] 
= E[E[(X - m)(Y — m)|Y]]. 


Now the conditional expectation of (X — m,)(Y — m) given Y = yis 


E[(X — m)(Y — m)|Y = y] = (y — m)E[X - m |Y = y] 


where we have used the fact that the conditional mean of X given Y = y is 
mı + px y(o;/o2)(y — m). Therefore 
o1 
E[(X — m)(¥ - m)|Y] = SO — m) 
and 


O71 


COV(X,Y) = E[E[(X — m)(Y — m)|Y]] = Bane Pe my)" 


= Px,yO192.- 


The above equation is consistent with the definition of the correlation coefficient, 
pxy = COV(X, Y)/o,02. Thus the px y in Eq. (5.61a) is indeed the correlation coeffi- 
cient between X and Y. 


Example 5.46 


The amount of yearly rainfall in city 1 and in city 2 is modeled by a pair of jointly Gaussian random vari- 
ables, X and Y, with pdf given by Eq. (5.61a). Find the most likely value of X given that we know Y = y. 

The most likely value of X given Y = y is the value of x for which fy(x| y) is maximum. The 
conditional pdf of X given Y = y is given by Eq. (5.64), which is maximum at the conditional mean 


T1 
E[X|y] =m 4 Pay 


(y — my). 
2 


Note that this “maximum likelihood” estimate is a linear function of the observation y. 


282 


Chapter 5 Pairs of Random Variables 


Example 5.47 Estimation of Signal in Noise 


Let Y = X + N where X (the “signal”) and N (the “noise’) are independent zero-mean Gaussian 
random variables with different variances. Find the correlation coefficient between the observed 
signal Y and the desired signal X. Find the value of x that maximizes fx(x | y). 

The mean and variance of Y and the covariance of X and Y are: 


E[Y] = E[X] + E[N] =0 
oy = E[Y?] = E[(X + NY] = E[X? + 2XN + N’] = E[X?] + E[N’] = o} + of. 
COV(X,Y) = E[(X - E[X])(E(Y - E[Y])] = ELXY] = E[X(X + N)] = of. 


Therefore, the correlation coefficient is: 


COV(X, Y) ox ox 1 
nay oyoy oy (0% + ow)? oy 12° 
te 
ox 


Note that PxY = clot = 1 — anloy. 
To find the joint pdf of X and Y consider the following linear transformation: 


X =X which has inverse X=X 
Y=X+N N=-X+Y. 
From Eq. (5.52) we have: 


fede fx, n(x, y) eX Rok enhon 
XY, Y) = — na = 
: det A x=x,n=y-x V 2TO x V 2TON x=Xx,n=y—-x 
eX ox ex) Row 


V2ra0x V270N 


The conditional pdf of the signal X given the observation Y is then: 


f ( | ) fx x(x, y) ex Pek e (9x) Rok V 2mOy 
X(Xly g 
fry) V2moy V2TOy eyo 


AER- G) fie- e) 


ox y 


V 270 yO ylay V 270 yo x/loy 


a2 
ep mal - Gata} 


V1- pry Ox 


This pdf has its maximum value, when the argument of the exponent is zero, that is, 
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> X 


FIGURE 5.27 


A rotation of the coordinate system transforms a pair of 
dependent Gaussian random variables into a pair of independent 
Gaussian random variables. 


The signal-to-noise ratio (SNR) is defined as the ratio of the variance of X and the variance of N. 
At high SNRs this estimator gives x ~ y, and at very low signal-to-noise ratios, it gives x ~ 0. 


Example 5.48 Rotation of Jointly Gaussian Random Variables 


The ellipse corresponding to an arbitrary two-dimensional Gaussian vector forms an angle 


1 2p0102 
0 = ~arctan| — 7 
2. Ui — 07 


relative to the x-axis. Suppose we define a new coordinate system whose axes are aligned with those 
of the ellipse as shown in Fig. 5.27. This is accomplished by using the following rotation matrix: 


V |_| cos@ sin@ || X 
W —sin@ cosð || Y V 


To show that the new random variables are independent it suffices to show that they have 
covariance zero: 
COV(V,W) = E[(V — E[V])(W — E[W])] 


= E[{(X — m,)cos 6 + (Y — my)sin 6} 
x {-(X — m,)sin@ + (Y — my) cos 0}] 
= —oj{ sin 0 cos 0 + COV(X, Y)cos” 0 
-—COV(X, Y)sin? 0 + a3 sin 0 cos 0 


(05 — of)sin 20 + 2 COV(X, Y)cos 20 
2 


cos 26[(a3 — 07) tan 20 + 2 COV(X, Y)] 
5 . 
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If we let the angle of rotation 0 be such that 


2 COV(X, Y) 
tan 20 = ~; 7 > 
O01, — 07 


then the covariance of V and W is zero as required. 


GENERATING INDEPENDENT GAUSSIAN RANDOM VARIABLES 


We now present a method for generating unit-variance, uncorrelated (and hence inde- 
pendent) jointly Gaussian random variables. Suppose that X and Y are two indepen- 
dent zero-mean, unit-variance jointly Gaussian random variables with pdf: 


E oa 
fxx(x, y) = 57e l 
In Example 5.44 we saw that the transformation 


R= VX +Y and ® = tan! Y/X 


leads to the pair of independent random variables 


feelt, 0) = 5 re = felr)fol0), 


where R is a Rayleigh random variable and © is a uniform random variable. The above 
transformation is invertible. Therefore we can also start with independent Rayleigh 
and uniform random variables and produce zero-mean, unit-variance independent 
Gaussian random variables through the transformation: 


X = Rcos® and Y = Rsin®. (5.65) 


Consider W = R? where R is a Rayleigh random variable. From Example 5.41 
we then have that: W has pdf 


frR(Vw)  Vwe™ wn 
fi(w) = So = = Sewn, 
2Vw 2Vw 2 
W = R? has an exponential distribution with A = 1/2. 

Therefore we can generate R? by generating an exponential random variable 
with parameter 1/2, and we can generate © by generating a random variable that is 
uniformly distributed in the interval (0, 27). If we substitute these random variables 
into Eq. (5.65), we then obtain a pair of independent zero-mean, unit-variance Gauss- 
ian random variables. The above discussion thus leads to the following algorithm: 


1. Generate U; and U;, two independent random variables uniformly distributed in 
the unit interval. 


2. Let R? = —2 log U, and @ = 2mU). 
3. Let X = R cos ® = (—2 log U,)'” cos 27U, and Y = R sin @ = (—2 log U,)!” 
sin 27U}. 


0.1 


0.08 


0.06 


0.04 


0.02 


0 
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Then X and Y are independent, zero-mean, unit-variance Gaussian random vari- 
ables. By repeating the above procedure we can generate any number of such ran- 
dom variables. 


Example 5.49 


Use Octave or MATLAB to generate 1000 independent zero-mean, unit-variance Gaussian ran- 
dom variables. Compare a histogram of the observed values with the pdf of a zero-mean unit- 
variance random variable. 

The Octave commands below show the steps for generating the Gaussian random vari- 
ables. A set of histogram range values K from —4 to 4 is created and used to build a normalized 
histogram Z. The points in Z are then plotted and compared to the value predicted to fall in 
each interval by the Gaussian pdf. These plots are shown in Fig. 5.28, which shows excellent 
agreement. 


>Ul=rand (1000,1); % Create a 1000-element vector U; (step 1). 
> U2=rand (1000,1); % Create a 1000-element vector U, (step 1). 
> R2=-2*log (U1) ; % Find R? (step 2). 

> TH=2*pi*U2; % Find 0 (step 2). 

> X=sqrt (R2) .*sin(TH) ; % Generate X (step 3). 


3 2.5) =2 =15.=1.=05. 0 ~0i5 °1' 15. 2 25. 3 


FIGURE 5.28 
Histogram of 1000 observations of a Gaussian random variable. 
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FIGURE 5.29 
Scattergram of 5000 pairs of jointly Gaussian random variables. 


> Y=sqrt (R2) .*cos (TH) ; % Generate Y (step 3). 

>K=-4:.2:4; % Create histogram range values K. 

> Z=hist (X,K) /1000 % Create normalized histogram Z based on K. 
> bar (K,Z) % Plot Z. 

>hold on 

>stem(K, .2*normal_pdf (K,0,1) ) % Compare to values predicted by pdf. 


We also plotted the X values vs. the Y values for 5000 pairs of generated random variables 
in a scattergram as shown in Fig. 5.29. Good agreement with the circular symmetry of the jointly 
Gaussian pdf of zero-mean, unit-variance pairs is observed. 

In the next chapter we will show how to generate a vector of jointly Gaussian random 
variables with an arbitrary covariance matrix. 


SUMMARY 


e The joint statistical behavior of a pair of random variables X and Y is specified 
by the joint cumulative distribution function, the joint probability mass func- 
tion, or the joint probability density function. The probability of any event in- 
volving the joint behavior of these random variables can be computed from 
these functions. 
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The statistical behavior of individual random variables from X is specified by the 
marginal cdf, marginal pdf, or marginal pmf that can be obtained from the joint 
cdf, joint pdf, or joint pmf of X. 

Two random variables are independent if the probability of a product-form event 
is equal to the product of the probabilities of the component events. Equivalent 
conditions for the independence of a set of random variables are that the joint 
cdf, joint pdf, or joint pmf factors into the product of the corresponding marginal 
functions. 

The covariance and the correlation coefficient of two random variables are mea- 
sures of the linear dependence between the random variables. 

If X and Y are independent, then X and Y are uncorrelated, but not vice versa. If 
X and Y are jointly Gaussian and uncorrelated, then they are independent. 


The statistical behavior of X, given the exact values of X or Y, is specified by the 
conditional cdf, conditional pmf, or conditional pdf. Many problems lend them- 
selves to a solution that involves conditioning on the value of one of the random 
variables. In these problems, the expected value of random variables can be ob- 
tained by conditional expectation. 

The joint pdf of a pair of jointly Gaussian random variables is determined by the 
means, variances, and covariance. All marginal pdf’s and conditional pdf’s are 
also Gaussian pdf’s. 

Independent Gaussian random variables can be generated by a transformation of 
uniform random variables. 


CHECKLIST OF IMPORTANT TERMS 


Central moments of X and Y Joint pmf 

Conditional cdf Jointly continuous random variables 
Conditional expectation Jointly Gaussian random variables 
Conditional pdf Linear transformation 
Conditional pmf Marginal cdf 

Correlation of X and Y Marginal pdf 

Covariance X and Y Marginal pmf 

Independent random variables Orthogonal random variables 
Joint cdf Product-form event 

Joint moments of X and Y Uncorrelated random variables 
Joint pdf 


ANNOTATED REFERENCES 


Papoulis [1] is the standard reference for electrical engineers for the material on ran- 
dom variables. References [2] and [3] present many interesting examples involving 
multiple random variables. The book by Jayant and Noll [4] gives numerous applica- 
tions of probability concepts to the digital coding of waveforms. 
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McGraw-Hill, New York, 2002. 
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Section 5.1: Two Random Variables 


5.1. 


5.2. 


5.3. 


5.4. 


5.5. 


5.6. 


Let X be the maximum and let Y be the minimum of the number of heads obtained when 

Carlos and Michael each flip a fair coin twice. 

(a) Describe the underlying space S of this random experiment and show the mapping 
from S to Syy, the range of the pair (X, Y). 

(b) Find the probabilities for all values of (X, Y). 

(c) FindP[X =Y]. 

(d) Repeat parts b and c if Carlos uses a biased coin with P[heads] = 3/4. 

Let X be the difference and let Y be the sum of the number of heads obtained when Car- 

los and Michael each flip a fair coin twice. 

(a) Describe the underlying space S of this random experiment and show the mapping 
from S to Syy, the range of the pair (X, Y). 

(b) Find the probabilities for all values of (X, Y). 

(c) FindP[X +Y =1],P[X + Y = 2]. 

The input X to a communication channel is “—1”or “1”, with respective probabilities 1/4 

and 3/4. The output of the channel Y is equal to: the corresponding input X with proba- 

bility 1 — p — pe; —X with probability p; 0 with probability p,. 

(a) Describe the underlying space S of this random experiment and show the mapping 
from S to Syy, the range of the pair (X, Y). 

(b) Find the probabilities for all values of (X, Y). 

(c) Find P[X 4 Y], P[Y = 0]. 

(a) Specify the range of the pair (N,, N2) in Example 5.2. 

(b) Specify and sketch the event “more revenue comes from type 1 requests than type 2 
requests.” 

(a) Specify the range of the pair (Q, R) in Example 5.3. 

(b) Specify and sketch the event “last packet is more than half full.” 


Let the pair of random variables H and W be the height and weight in Example 5.1. 
The body mass index is a measure of body fat and is defined by BMI = W/H? where 
W is in kilograms and H is in meters. Determine and sketch on the plane the 
following events: A = {“obese,” BMI = 30}; B = {“overweight,” 25 =< BMI < 30}; 
C = {“normal,” 18.5 = BMI < 25}; and D = {“underweight,” BMI < 18.5}. 


5.7. 


5.8. 
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Let (X, Y) be the two-dimensional noise signal in Example 5.4. Specify and sketch the 
events: 


(a) “Maximum noise magnitude is greater than 5.” 
(b) “The noise power X? + Y? is greater than 4.” 


(c) “The noise power X? + Y? is greater than 4 and less than 9.” 


For the pair of random variables (X, Y) sketch the region of the plane corresponding to 
the following events. Identify which events are of product form. 


(a) {X + Y > 3}. 

(b) {e* > Ye}. 

(©) {min(X, Y) > 0} U {max{X,Y) < 0}. 
(œ {|X - Y| = 1}. 

(e) {|X/Y| > 2}. 

(f) {X/Y < 2}. 

(g) {X >Y}. 

(h) {XY < 0}. 

(i) {max(|X|, Y) < 3}. 


Section 5.2: Pairs of Discrete Random Variables 


5.9. 


5.10. 


5.11. 


5.12. 


(a) Find and sketch py y(x, y) in Problem 5.1 when using a fair coin. 

(b) Find px(x) and py(y). 

(c) Repeat parts a and b if Carlos uses a biased coin with P[ heads] = 3/4. 

(a) Find and sketch py y(x, y) in Problem 5.2 when using a fair coin. 

(b) Find px(x) and py(y). 

(c) Repeat parts a and b if Carlos uses a biased coin with P[heads] = 3/4. 

(a) Find the marginal pmf’s for the pairs of random variables with the indicated joint 


pmf. 
G) (ii) (iii) 
X/Y -1 0 1 X/Y -1 0 1 XY -1 0 1 
—1 16 1⁄6 0 —1 19 1/9 1/9 -1 18 0 0 
0 0 0 1/3 0 19 19 1/9 0 0 13 0 
1 1/6 1/6 0 1 1⁄9 1/9 1/9 1 0 0 13 


(b) Find the probability of the events A= {X >0},B={X = Y}, and C= 
{X = —Y} for the above joint pmf’s. 
A modem transmits a two-dimensional signal (X, Y) given by: 


X =rcos(27@/8) and Y = rsin(270/8) 


where © is a discrete uniform random variable in the set {0,1,2,..., 7}. 

(a) Show the mapping from S to Syy, the range of the pair (X, Y). 

(b) Find the joint pmf of X and Y. 

(c) Find the marginal pmf of X and of Y. 

(d) Find the probability of the following events: A = {X = 0},B = {Y = r/ V2}, 
C = {X = r/V2,Y = r/V2}, D = {X < -r/ V2}. 
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5.13. 


5.14. 


5.15. 


Pairs of Random Variables 


Let N; be the number of Web page requests arriving at a server in a 100-ms period and let 

N, be the number of Web page requests arriving at a server in the next 100-ms period. 

Assume that in a 1-ms interval either zero or one page request takes place with respec- 

tive probabilities 1 — p = 0.95 and p = 0.05, and that the requests in different 1-ms in- 

tervals are independent of each other. 

(a) Describe the underlying space S of this random experiment and show the mapping 
from S to Syy, the range of the pair (X, Y). 

(b) Find the joint pmf of X and Y. 

(c) Find the marginal pmf for X and for Y. 

(d) Find the probability of the events A = {X = Y},B = {X =Y =0},C={X >5, 
Y > 3}. 

(e) Find the probability of the event D = {X + Y = 10}. 

Let N, be the number of Web page requests arriving at a server in the period (0, 100) ms 

and let N, be the total combined number of Web page requests arriving at a server in the 

period (0, 200) ms. Assume arrivals occur as in Problem 5.13. 


(a) Describe the underlying space S of this random experiment and show the mapping 
from S to Syy, the range of the pair (X, Y). 

(b) Find the joint pmf of N; and N2. 

(c) Find the marginal pmf for N; and Nj. 

(d) Find the probability of the events A = {N, < No}, B = {N, = 0},C = {N > 5, 
N» > 3}, D = {IN — 2N| < 2}. 

At even time instants, a robot moves either + A cm or —A cm in the x-direction according 

to the outcome of a coin flip; at odd time instants, a robot moves similarly according to 

another coin flip in the y-direction. Assuming that the robot begins at the origin, let X 

and Y be the coordinates of the location of the robot after 2n time instants. 

(a) Describe the underlying space S of this random experiment and show the mapping 
from S to Syy, the range of the pair (X, Y). 

(b) Find the marginal pmf of the coordinates X and Y. 

(c) Find the probability that the robot is within distance V2 of the origin after 2n time 
instants. 


Section 5.3: The Joint cdf of x and y 


5.16. 


5.17. 


5.18. 


(a) Sketch the joint cdf for the pair (X, Y) in Problem 5.1 and verify that the properties of 
the joint cdf are satisfied. You may find it helpful to first divide the plane into regions 
where the cdf is constant. 

(b) Find the marginal cdf of X and of Y. 

A point ( X , Y) is selected at random inside a triangle defined by {(x, y):0 = y= x <1}. 

Assume the point is equally likely to fall anywhere in the triangle. 

(a) Find the joint cdf of X and Y. 

(b) Find the marginal cdf of X and of Y. 

(c) Find the probabilities of the following events in terms of the joint cdf: 
A= {X =1/2,Y = 3/4}, B = {1/4 < X = 3/4,1/4< Y = 3/4}. 

A dart is equally likely to land at any point (X,, X2) inside a circular target of unit radius. 

Let R and © be the radius and angle of the point (X4, X2). 

(a) Find the joint cdf of R and ©. 

(b) Find the marginal cdf of R and ©. 


5.19. 


5.20. 


5.21. 


5.22. 


5.23. 


5.24. 
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(c) Use the joint cdf to find the probability that the point is in the first quadrant of the 
real plane and that the radius is greater than 0.5. 


Find an expression for the probability of the events in Problem 5.8 parts c, h, and i in 
terms of the joint cdf of X and Y. 


The pair (X, Y) has joint cdf given by: 


(1 = 1x2 (1 -1/3  forx >1,y>1 
F = 
xv y) K elsewhere. 


(a) Sketch the joint cdf. 

(b) Find the marginal cdf of X and of Y. 

(c) Find the probability of the following events: {X < 3, Y = 5}, {X > 4, Y > 3}. 
Is the following a valid cdf? Why? 


1 — 1/x?y? forx>1,y>1 
Fy y(x, y) = t y) ó 


Let Fy(x) and Fy(y) be valid one-dimensional cdf’s. Show that Fy y(x, y) = Fx(x)Fy(y) 
satisfies the properties of a two-dimensional cdf. 


elsewhere. 


The number of users logged onto a system N and the time T until the next user logs off 
have joint probability given by: 


PIN =n,X =t]=(1-p)p™ "(1 — e™)  forn=1,2,... t>0. 


(a) Sketch the above joint probability. 

(b) Find the marginal pmf of N. 

(c) Find the marginal cdf of X. 

(d) Find P[N = 3, X > 3/A]. 

A factory has n machines of a certain type. Let p be the probability that a machine is 
working on any given day, and let N be the total number of machines working on a cer- 
tain day. The time T required to manufacture an item is an exponentially distributed ran- 
dom variable with rate ka if k machines are working. Find and P[T = t]. Find P[T = t] 
as t — œ and explain the result. 


Section 5.4: The Joint pdf of Two Continuous Random Variables 


5.25. 


5.26. 


The amplitudes of two signals X and Y have joint pdf: 


fry(xy) = eye" — forx > 0,y > 0. 


(a) Find the joint cdf. 

(b) Find P[ X"? > Y]. 

(c) Find the marginal pdfs. 
Let X and Y have joint pdf: 


fyy(% y) = k(x + y) fr0sxs1,0sy=s1. 


(a) Find k. 

(b) Find the joint cdf of (X, Y). 

(c) Find the marginal pdf of X and of Y. 

(d) Find P[X < Y], P[Y < X7],P[X + Y > 0.5]. 
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5.27. Let X and Y have joint pdf: 


5.28. 


5.29. 


5.30. 


5.31. 


5.32. 


5.33. 


5.34. 


fxxy(x, y) = kx(1 — x)y fr0<x<1,0< y<1. 


(a) Find k. 

(b) Find the joint cdf of (X, Y). 

(c) Find the marginal pdf of X and of Y. 

(d) Find P[Y < X!?], PLX < Y]. 

The random vector (X, Y) is uniformly distributed (i.e., f(x, y) = k) in the regions shown 
in Fig. P5.1 and zero elsewhere. 


(i) y4 (ii) J Gii) yi 


FIGURE P5.1 


(a) Find the value of k in each case. 

(b) Find the marginal pdf for X and for Y in each case. 

(c) Find P[X > 0,Y > 0}. 

(a) Find the joint cdf for the vector random variable introduced in Example 5.16. 
(b) Use the result of part a to find the marginal cdf of X and of Y. 

Let X and Y have the joint pdf: 


fxy(x, y) = ye?  forx > 0,y > 0. 


Find the marginal pdf of X and of Y. 

Let X and Y be the pair of random variables in Problem 5.17. 

(a) Find the joint pdf of X and Y. 

(b) Find the marginal pdf of X and of Y. 

(c) Find P[Y < X’). 

Let R and © be the pair of random variables in Problem 5.18. 

(a) Find the joint pdf of R and ©. 

(b) Find the marginal pdf of R and of ©. 

Let (X, Y) be the jointly Gaussian random variables discussed in Example 5.18. Find 
P[X? + Y? > r?°] when p = 0. Hint: Use polar coordinates to compute the integral. 
The general form of the joint pdf for two jointly Gaussian random variables is given by 
Eq. (5.61a). Show that X and Y have marginal pdfs that correspond to Gaussian random 
variables with means m, and m, and variances oj and gå respectively. 


5.35. 


5.36. 
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The input X to a communication channel is +1 or —1 with probability p and 1 — p, respec- 
tively. The received signal Y is the sum of X and noise N which has a Gaussian distribu- 
tion with zero mean and variance g? = 0.25. 

(a) Find the joint probability P[ X = j, Y = y]. 

(b) Find the marginal pmf of X and the marginal pdf of Y. 

(c) Suppose we are given that Y > 0. Which is more likely, X = 1 or X = —1? 

A modem sends a two-dimensional signal X from the set {(1,1), (1,—-1),(-1,1), 

(-1,—-1)}. The channel adds a noise signal (N, N2), so the received signal is 

Y=X+N=(X,+ M, X + N). Assume that (N,, N2) have the jointly Gaussian 

pdf in Example 5.18 with p=0. Let the distance between X and Y be 

d(X, Y) = {(%1 - NY)? + (%2 - V)”. 

(a) Suppose that X = (1,1). Find and sketch region for the event {Y is closer to (1, 1) 
than to the other possible values of X}. Evaluate the probability of this event. 

(b) Suppose that X = (1,1). Find and sketch region for the event {Y is closer to 
(1, -1) than to the other possible values of X}. Evaluate the probability of this 
event. 

(c) Suppose that X = (1,1). Find and sketch region for the event {d(X, Y) > 1}. 
Evaluate the probability of this event. Explain why this probability is an upper 
bound on the probability that Y is closer to a signal other than X = (1, 1). 


Section 5.5: Independence of Two Random Variables 


5.37. 


5.38. 


5.39. 


5.40. 


5.41. 


5.42. 
5.43. 
5.44. 
5.45. 
5.46. 


Let X be the number of full pairs and let Y be the remainder of the number of dots ob- 
served in a toss of a fair die. Are X and Y independent random variables? 


Let X and Y be the coordinates of the robot in Problem 5.15 after 2n time instants. Deter- 
mine whether X and Y are independent random variables. 

Let X and Y be the coordinates of the two-dimensional modem signal (X, Y) in 
Problem 5.12. 

(a) Determine if X and Y are independent random variables. 

(b) Repeat part a if even values of © are twice as likely as odd values. 

Determine which of the joint pmfs in Problem 5.11 correspond to independent pairs of 
random variables. 


Michael takes the 7:30 bus every morning. The arrival time of the bus at the stop is uni- 
formly distributed in the interval [7:27, 7:37]. Michael’s arrival time at the stop is also uni- 
formly distributed in the interval [7:25, 7:40]. Assume that Michael’s and the bus’s arrival 
times are independent random variables. 


(a) What is the probability that Michael arrives more than 5 minutes before the bus? 
(b) What is the probability that Michael misses the bus? 

Are R and O independent in Problem 5.18? 

Are X and Y independent in Problem 5.20? 

Are the signal amplitudes X and Y independent in Problem 5.25? 

Are X and Y independent in Problem 5.26? 

Are X and Y independent in Problem 5.27? 
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5.47. 


5.48. 


5.49, 


5.50. 


5.51. 


5.52. 


5.53. 


5.54. 


5.55. 


Pairs of Random Variables 


Let X and Y be independent random variables. Find an expression for the probability of 
the following events in terms of Fy(x) and Fy(y). 


(a) {a< X Sb}N{Y >d}. 

b) {a< X =b}N{c SY <d}. 

© {|X| <alN{esY <d}. 

Let X and Y be independent random variables that are uniformly distributed in [—1, 1]. 
Find the probability of the following events: 

(a) PLX? < 1/2, |Y| < 1/2]. 

(b) P[4X < 1,Y < 0]. 

(© P[XY < 1/2]. 

(d) P[max(X,Y) < 1/3]. 

Let X and Y be random variables that take on values from the set {—1, 0, 1}. 
(a) Find a joint pmf for which X and Y are independent. 

(b) Are X? and Y? independent random variables for the pmf in part a? 


(c) Find a joint pmf for which X and Y are not independent, but for which X? and Y? 
are independent. 


Let X and Y be the jointly Gaussian random variables introduced in Problem 5.34. 

(a) Show that X and Y are independent random variables if and only if p = 0. 

(b) Suppose p = 0, find P[ XY < 0]. 

Two fair dice are tossed repeatedly until a pair occurs. Let K be the number of tosses re- 


quired and let X be the number showing up in the pair. Find the joint pmf of K and X and 
determine whether K and X are independent. 


The number of devices L produced in a day is geometric distributed with probability of 
success p. Let N be the number of working devices and let M be the number of defective 
devices produced in a day. 


(a) Are N and M independent random variables? 

(b) Find the joint pmf of N and M. 

(c) Find the marginal pmfs of N and M. (See hint in Problem 5.87b.) 
(d) Are L and M independent random variables? 


Let N; be the number of Web page requests arriving at a server in a 100-ms period and let 
N, be the number of Web page requests arriving at a server in the next 100-ms period. 
Use the result of Problem 5.13 parts a and b to develop a model where N; and N, are 
independent Poisson random variables. 

(a) Show that Eq. (5.22) implies Eq. (5.21). 

(b) Show that Eq. (5.21) implies Eq. (5.22). 

Verify that Eqs. (5.22) and (5.23) can be obtained from each other. 


Section 5.6: Joint Moments and Expected Values of a Function of Two Random 


5.56. 


Variables 
(a) Find E[(X + Y)’]. 
(b) Find the variance of X + Y. 


(c) Under what condition is the variance of the sum equal to the sum of the individual 
variances? 


5.57 


5.58. 


5.59. 


5.60. 


5.61. 


5.62. 


5.63. 
5.64. 


5.65. 


5.66. 


5.67. 


5.68. 


5.69. 
5.70. 


5.71. 


5.72. 
5.73. 


5.74. 
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. Find E[|X — Y|] if X and Y are independent exponential random variables with para- 
meters A; = 1 and A = 2, respectively. 

Find E[X7e’] where X and Y are independent random variables, X is a zero-mean, 
unit-variance Gaussian random variable, and Y is a uniform random variable in the 
interval [0, 3]. 

For the discrete random variables X and Y in Problem 5.1, find the correlation and covariance, 
and indicate whether the random variables are independent, orthogonal, or uncorrelated. 

For the discrete random variables X and Y in Problem 5.2, find the correlation and 
covariance, and indicate whether the random variables are independent, orthogonal, 
or uncorrelated. 

For the three pairs of discrete random variables in Problem 5.11, find the correlation and 
covariance of X and Y, and indicate whether the random variables are independent, or- 
thogonal, or uncorrelated. 

Let N; and N, be the number of Web page requests in Problem 5.13. Find the correlation 
and covariance of N; and N, and indicate whether the random variables are indepen- 
dent, orthogonal, or uncorrelated. 

Repeat Problem 5.62 for N; and N», the number of Web page requests in Problem 5.14. 
Let N and T be the number of users logged on and the time till the next logoff in 
Problem 5.23. Find the correlation and covariance of N and 7, and indicate whether 
the random variables are independent, orthogonal, or uncorrelated. 

Find the correlation and covariance of X and Y in Problem 5.26. Determine whether X 
and Y are independent, orthogonal, or uncorrelated. 

Repeat Problem 5.65 for X and Y in Problem 5.27. 

For the three pairs of continuous random variables X and Y in Problem 5.28, find the cor- 
relation and covariance, and indicate whether the random variables are independent, or- 
thogonal, or uncorrelated. 

Find the correlation coefficient between X and Y = aX + b. Does the answer depend 
on the sign of a? 

Propose a method for estimating the covariance of two random variables. 

(a) Complete the calculations for the correlation coefficient in Example 5.28. 

(b) Repeat the calculations if X and Y have the pdf: 


fxylx, y) = eD forx > 0,-x<y <x. 


The output of a channel Y = X + N, where the input X and the noise N are indepen- 
dent, zero-mean random variables. 


(a) Find the correlation coefficient between the input X and the output Y. 


(b) Suppose we estimate the input X by a linear function g(Y) = aY. Find the value of a 
that minimizes the mean squared error E[(X — aY)*]. 


(c) Express the resulting mean-square error in terms of o y/o y. 

In Example 5.27 let X = cos @/4 and Y = sin 0/4. Are X and Y uncorrelated? 

(a) Show that COV(X, E[Y| X]) = COV(X, Y). 

(b) Show that E[Y|X = x] = E[Y], for all x, implies that X and Y are uncorrelated. 
Use the fact that E[(tX + Y)*] = 0 for all t to prove the Cauchy-Schwarz inequality: 


(E[XY]) = E[X?]E[Y’]. 


Hint: Consider the discriminant of the quadratic equation in ¢ that results from the above 
inequality. 
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Pairs of Random Variables 


Section 5.7: Conditional Probability and Conditional Expectation 


5.75. 


5.76. 


5.77. 


5.78. 


5.79. 


5.80. 


5.81. 


5.82. 


5.83. 
5.84. 


(a) 
(b) 


(c) 
(d) 
(e) 
(a) 
(b) 


(c) 
(a) 


(b) 
(c) 
(d) 
(a) 


(b) 
(c) 


Find py(y|x) and py(x| y) in Problem 5.1 assuming fair coins are used. 

Find py(y|x) and px(x|y) in Problem 5.1 assuming Carlos uses a coin with 
p = 3/4. 

What is the effect on p(x | y) of Carlos using a biased coin? 

Find E[Y | X = x] and E[X|Y = y] in part a; then find E[X] and E[Y]. 

Find E[Y | X = x] and E[X|Y = y] in part b; then find E[X] and E[Y]. 

Find py(x| y) for the communication channel in Problem 5.3. 


For each value of y, find the value of x that maximizes py(x| y). State any assump- 
tions about p and pe. 

Find the probability of error if a receiver uses the decision rule from part b. 

In Problem 5.11(i), which conditional pmf given X provides the most information 
about Y: py(y|—1), py(y |0), or py(y| +1)? Explain why. 

Compare the conditional pmfs in Problems 5.11 (ii) and (iii) and explain which of 
these two cases is “more random.” 


Find E[Y |X = x] and E[X |Y = y] in Problems 5.11(i), (ii), (iii); then find ELX] 
and E[Y]. 

Find E[Y?| X = x] and E[X? |Y = y] in Problems 5.11(i), (ii), (iii); then find 
VAR[X] and VAR[Y]. 

Find the conditional pmf of N; given N, in Problem 5.14. 

Find P[N, = k| Ny = 2k] for k = 5, 10, 20. Hint: Use Stirling’s fromula. 

Find E[N; | N, = k], then find E[N]. 


In Example 5.30, let Y be the number of defects inside the region R and let Z be the num- 
ber of defects outside the region. 


(a) 
(b) 
(c) 
(a) 
(b) 
(c) 
(d) 
(a) 
(b) 
(c) 
(d) 
(a) 
(b) 
(c) 
(d) 
(e) 


Find the pmf of Z given Y. 

Find the joint pmf of Y and Z. 

Are Y and Z independent random variables? Is the result intuitive? 
Find fy(y |x) in Problem 5.26. 

Find P[Y > X |x]. 

Find P[Y > X] using part b. 

Find E[Y | X = x]. 

Find fy(y |x) in Problem 5.28(i). 

Find E[Y|X = x] and E [Y]. 

Repeat parts a and b of Problem 5.28(ii). 
Repeat parts a and b of Problem 5.28(iii). 
Find fy(y |x) in Example 5.27. 

Find E[Y|X = x]. 

Find E[Y]. 

Find E[ XY | X = x]. 

Find E[ XY]. 


Find fy(y |x) and fy(x| y) for the jointly Gaussian pdf in Problem 5.34. 


(a) 
(b) 
(c) 


Find fy(t| N = n) in Problem 5.23. 
Find E[X'|N = n]. 
Find the value of n that maximizes P[N = n|t < X < t + dt]. 


5.85. 


5.86. 


5.87. 


5.88. 


5.89. 
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(a) Find py(y|x) and px(x| y) in Problem 5.12. 

(b) Find E[Y|X = x]. 

(c) Find E[XY|X = x] and E[XY]. 

A customer enters a store and is equally likely to be served by one of three clerks. The 
time taken by clerk 1 is a constant random variable with mean two minutes; the time for 
clerk 2 is exponentially distributed with mean two minutes; and the time for clerk 3 is 
Pareto distributed with mean two minutes and a = 2.5. 

(a) Find the pdf of 7, the time taken to service a customer. 

(b) Find E[T] and VAR[7]. 

A message requires N time units to be transmitted, where N is a geometric random 
variable with pmf p; = (1 — a)a’',i = 1,2,.... A single new message arrives dur- 
ing a time unit with probability p, and no messages arrive with probability 1 — p. 
Let K be the number of new messages that arrive during the transmission of a 
single message. 

(a) Find E[K] and VAR[K] using conditional expectation. 


(b) Find the pmf of K. Hint: (1 — py“ = > ("er 
n=k 

(c) Find the conditional pmf of N given K = k. 

(d) Find the value of n that maximizes P[N = n|X = k]. 


The number of defects in a VLSI chip is a Poisson random variable with rate r. However, 
r is itself a gamma random variable with parameters a and A. 


(a) Use conditional expectation to find E[N] and VAR[N]. 
(b) Find the pmf for N, the number of defects. 


(a) In Problem 5.35, find the conditional pmf of the input X of the communication chan- 
nel given that the output is in the interval y < Y = y + dy. 


(b) Find the value of X that is more probable given y < Y S y + dy. 


(c) Find an expression for the probability of error if we use the result of part b to decide 
what the input to the channel was. 


Section 5.8: Functions of Two Random Variables 


5.90. 


5.91. 


5.92. 


Two toys are started at the same time each with a different battery. The first battery has a 
lifetime that is exponentially distributed with mean 100 minutes; the second battery has a 
Rayleigh-distributed lifetime with mean 100 minutes. 


(a) Find the cdf to the time T until the battery in a toy first runs out. 


(b) Suppose that both toys are still operating after 100 minutes. Find the cdf of the time 
T, that subsequently elapses until the battery in a toy first runs out. 


(c) In part b, find the cdf of the total time that elapses until a battery first fails. 

(a) Find the cdf of the time that elapses until both batteries run out in Problem 5.90a. 

(b) Find the cdf of the remaining time until both batteries run out in Problem 5.90b. 

Let K and N be independent random variables with nonnegative integer values. 

(a) Find an expression for the pmf of M = K + N. 

(b) Find the pmf of M if K and N are binomial random variables with parameters (k, p) 
and (n, p). 

(c) Find the pmf of M if K and N are Poisson random variables with parameters a, and 
a, respectively. 
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5.93. 


5.94. 


5.95. 


5.96. 


5.97. 


5.98. 


5.99. 


5.100. 
5.101. 
5.102. 
5.103. 
5.104. 


5.105. 


5.106. 


Pairs of Random Variables 


The number X of goals the Bulldogs score against the Flames has a geometric distribu- 
tion with mean 2; the number of goals Y that the Flames score against the Bulldogs is also 
geometrically distributed but with mean 4. 


(a) Find the pmf ofthe Z = X — Y. Assume X and Y are independent. 

(b) What is the probability that the Bulldogs beat the Flames? Tie the Flames? 

(c) Find E[Z]. 

Passengers arrive at an airport taxi stand every minute according to a Bernoulli random 
variable. A taxi will not leave until it has two passengers. 

(a) Find the pmf until the time T when the taxi has two passengers. 

(b) Find the pmf for the time that the first customer waits. 


Let X and Y be independent random variables that are uniformly distributed in the in- 
terval [0, 1]. Find the pdf of Z = XY. 

Let X,, X2, and X; be independent and uniformly distributed in [ —1, 1]. 

(a) Find the cdf and pdf of Y = X, + X. 

(b) Find the cdf of Z = Y + X3. 

Let X and Y be independent random variables with gamma distributions and parameters 
(a, A) and (a, A), respectively. Show that Z = X + Y is gamma-distributed with para- 
meters (a, + œ, A). Hint: See Eq. (4.59). 

Signals X and Y are independent. X is exponentially distributed with mean 1 and Y is 
exponentially distributed with mean 1. 

(a) Find the cdf of Z = |X — Y|. 

(b) Use the result of part a to find E[Z]. 

The random variables X and Y have the joint pdf 


fxy(% y) =e OY) forO << y<x<1. 


Find the pdf of Z = X + Y. 

Let X and Y be independent Rayleigh random variables with parameters a = B = 1. 

Find the pdf of Z = X/Y. 

Let X and Y be independent Gaussian random variables that are zero mean and unit 

variance. Show that Z = X/Y is a Cauchy random variable. 

Find the joint cdf of W = min(X, Y) and Z = max(X, Y) if X and Y are independent 

and X is uniformly distributed in [0,1] and Y is uniformly distributed in [0, 1]. 

Find the joint cdf of W = min(X, Y) and Z = max(X, Y) if X and Y are independent 

exponential random variables with the same mean. 

Find the joint cdf of W = min(X, Y) and Z = max(X, Y) if X and Y are the indepen- 

dent Pareto random variables with the same distribution. 

LetW=X+YandZ=X —Y. 

(a) Find an expression for the joint pdf of W and Z. 

(b) Find fy.z(z, w) if X and Y are independent exponential random variables with 
parameter A = 1. 

(c) Find fw z(z, w) if X and Y are independent Pareto random variables with the same 
distribution. 


The pair (X, Y) is uniformly distributed in a ring centered about the origin and inner and 
outer radii r; < n. Let R and © be the radius and angle corresponding to (X, Y). Find the 
joint pdf of R and ©. 
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5.107. Let X and Y be independent, zero-mean, unit-variance Gaussian random variables. Let 
V = aX + bY andW = cX + eY. 


(a) Find the joint pdf of V and W, assuming the transformation matrix A is invertible. 
(b) Suppose A is not invertible. What is the joint pdf of V and W? 


5.108. Let X and Y be independent Gaussian random variables that are zero mean and unit 
variance. Let W = X* + Y? and let © = tan !(Y/X). Find the joint pdf of W and 0. 
5.109. Let X and Y be the random variables introduced in Example 5.4. Let R = (X? + Y°)" 
and let @ = tan !(Y/X). 
(a) Find the joint pdf of R and ©. 
(b) What is the joint pdf of X and Y? 


Section 5.9: Pairs of Jointly Gaussian Variables 
5.110. Let X and Y be jointly Gaussian random variables with pdf 


exp{—2x? — y?/2} 


2c 


fxy(x, y) = for all x, y. 


Find VAR[X], VAR[Y], and COV(X, Y). 
5.111. Let X and Y be jointly Gaussian random variables with pdf 


=) 
exp 5 [x? + 4y? — 3xy + 3y — 2x + 1] 


fxy(x, y) = for all x, y. 


2T 
Find E[X], E[Y], VAR[X], VAR[Y], and COV(X, Y). 
5.112. Let X and Y be jointly Gaussian random variables with E[Y] = 0,0, = 1,0, = 2, and 
E[X |Y] = Y/4 + 1. Find the joint pdf of X and Y. 
5.113. Let X and Y be zero-mean, independent Gaussian random variables with o° = 1. 
(a) Find the value of r for which the probability that (X, Y) falls inside a circle of radius 
ris 1/2. 
(b) Find the conditional pdf of (X, Y) given that (X, Y) is not inside a ring with inner ra- 
dius r, and outer radius r. 


5.114. Use a plotting program (as provided by Octave or MATLAB) to show the pdf for jointly 
Gaussian zero-mean random variables with the following parameters: 


(a) o, =1,0.=1,p = 0. 
(b) œ = 1,0. = 1,p = 08. 
(ce) o, = 1,02 = 1, p = —0.8. 
(d) œ = 1,02 =2,p = 0. 
(e) o, = 1,02 = 2, p = 0.8. 
(f) c= 1,0, = 10,p = 0.8. 
5.115. Let X and Y be zero-mean, jointly Gaussian random variables with 0, = 1,0, = 2, and 
correlation coefficient p. 


(a) Plot the principal axes of the constant-pdf ellipse of (X, Y). 
(b) Plot the conditional expectation of Y given X = x. 
(c) Are the plots in parts a and b the same or different? Why? 


5.116. Let X and Y be zero-mean, unit-variance jointly Gaussian random variables for which 
p = 1. Sketch the joint cdf of X and Y. Does a joint pdf exist? 
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5.117. 


5.118. 


5.119. 


5.120. 


Pairs of Random Variables 


Let h(x, y) be a joint Gaussian pdf for zero-mean, unit-variance Gaussian random vari- 
ables with correlation coefficient p,. Let g(x, y) be a joint Gaussian pdf for zero-mean, 
unit-variance Gaussian random variables with correlation coefficient p) # pı. Suppose 
the random variables X and Y have joint pdf 


fxx(x, y) = {h(x, y) + g(x, y)}2. 


(a) Find the marginal pdf for X and for Y. 

(b) Explain why X and Y are not jointly Gaussian random variables. 

Use conditional expectation to show that for X and Y zero-mean, jointly Gaussian random 
variables, E[ X°Y7] = E[X?]E[Y?] + 2E[ XY}. 

Let X = (X, Y) be the zero-mean jointly Gaussian random variables in Problem 5.110. 
Find a transformation A such that Z = AX has components that are zero-mean, unit- 
variance Gaussian random variables. 


In Example 5. 47, suppose we estimate the value of the signal X from the noisy observa- 
tion Y by: 
1+0 flak 
(a) Evaluate the mean square estimation error: E[(X — È ï; 
(b) How does the estimation error in part a vary with signal-to-noise ratio o y/o y? 


Section 5.10: Generating Independent Gaussian Random Variables 


5.121. 


5.122. 
5.123. 


5.124. 


Find the inverse of the cdf of the Rayleigh random variable to derive the transformation 

method for generating Rayleigh random variables. Show that this method leads to the same 

algorithm that was presented in Section 5.10. 

Reproduce the results presented in Example 5.49. 

Consider the two-dimensional modem in Problem 5.36. 

(a) Generate 10,000 discrete random variables uniformly distributed in the set 
{1,2,3,4}. Assign each outcome in this set to one of the signals 
{(1, 1), (1, -1), (—1, 1), (—1,—1)}. The sequence of discrete random variables 
then produces a sequence of 10,000 signal points X. 


(b) Generate 10,000 noise pairs N of independent zero-mean, unit-variance jointly 
Gaussian random variables. 

(c) Form the sequence of 10,000 received signals Y = (Y1, Y2) = X + N. 

(d) Plot the scattergram of received signal vectors. Is the plot what you expected? 

(e) Estimate the transmitted signal by the quadrant that Y falls in: X= (sgn(Yj), 
sgn(Y2)). 

(f) Compare the estimates with the actually transmitted signals to estimate the proba- 
bility of error. 

Generate a sequence of 1000 pairs of independent zero-mean Gaussian random vari- 

ables, where X has variance 2 and N has variance 1. Let Y = X + N be the noisy signal 

from Example 5.47. 

(a) Estimate X using the estimator in Problem 5.120, and calculate the sequence of esti- 
mation errors. 

(b) What is the pdf of the estimation error? 

(c) Compare the mean, variance, and relative frequencies of the estimation error with 
the result from part b. 


5.125. 


5.126. 


5.127. 


5.128. 
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Let X1, X2,..., Xi999 be a sequence of zero-mean, unit-variance independent Gaussian 
random variables. Suppose that the sequence is “smoothed” as follows: 


Y, = (X, + Xy_1)/2 where Xo = 0. 


(a) Find the pdf of (Y,,, Y,41)- 

(b) Generate the sequence of X,, and the corresponding sequence Y,,. Plot the scatter- 
gram of (Y,,, Y„+1). Does it agree with the result from part a? 

(c) Repeat parts a and b for Z, = (X, — Xy-1)/2. 

Let X and Y be independent, zero-mean, unit-variance Gaussian random variables. Find the 

linear transformation to generate jointly Gaussian random variables with means mm; , m , vari- 

ances 07, 73, and correlation coefficient p. Hint: Use the conditional pdf in Eq. (5.64). 

(a) Use the method developed in Problem 5.126 to generate 1000 pairs of jointly Gauss- 
ian random variables with m; = 1, m, = —1, variances oj = 1, 05 = 2, and correla- 
tion coefficient p = —1/2. 

(b) Plot a two-dimensional scattergram of the 1000 pairs and compare to equal-pdf con- 
tour lines for the theoretical pdf. 

Let H and W be the height and weight of adult males. Studies have shown that H (in cm) 

and V = In W (W in kg) are jointly Gaussian with parameters my = 174 cm, my = 4.4, 

oy = 42.36, of, = 0.021, and COV(H,V) = 0.458. 

(a) Use the method in part a to generate 1000 pairs (H, V). Plot a scattergram to check 
the joint pdf. 

(b) Convert the (H, V) pairs into (H, W) pairs. 

(c) Calculate the body mass index for each outcome, and estimate the proportion of the 
population that is underweight, normal, overweight, or obese. (See Problem 5.6.) 


Problems Requiring Cumulative Knowledge 


5.129. 


5.130. 


5.131. 


The random variables X and Y have joint pdf: 


fxy(x, y) = csin (x + y) 0 S x S 7/2,05 y S w/2. 


(a) Find the value of the constant c. 

(b) Find the joint cdf of X and Y. 

(c) Find the marginal pdf’s of X and of Y. 

(d) Find the mean, variance, and covariance of X and Y. 

An inspector selects an item for inspection according to the outcome of a coin flip: The item is 


inspected if the outcome is heads. Suppose that the time between item arrivals is an exponen- 

tial random variable with mean one. Assume the time to inspect an item is a constant value t. 

(a) Find the pmf for the number of item arrivals between consecutive inspections. 

(b) Find the pdf for the time X between item inspections. Hint: Use conditional expectation. 

(c) Find the value of p, so that with a probability of 90% an inspection is completed be- 
fore the next item is selected for inspection. 


The lifetime X of a device is an exponential random variable with mean = 1/R. Suppose 
that due to irregularities in the production process, the parameter R is random and has a 
gamma distribution. 


(a) Find the joint pdf of X and R. 
(b) Find the pdf of X. 
(c) Find the mean and variance of X. 


302 


Chapter 5 


5.132. 


5.133. 


Pairs of Random Variables 


Let X and Y be samples of a random signal at two time instants. Suppose that X and Y are 

independent zero-mean Gaussian random variables with the same variance. When signal 

“0” is present the variance is ø, and when signal “1” is present the variance is o} > 0%. 

Suppose signals 0 and 1 occur with probabilities p and 1 — p, respectively. Let 

R? = X? + Y? be the total energy of the two observations. 

(a) Find the pdf of R? when signal 0 is present; when signal 1 is present. Find the pdf of R°. 

(b) Suppose we use the following “signal detection” rule: If R? > T, then we decide sig- 
nal 1 is present; otherwise, we decide signal 0 is present. Find an expression for the 
probability of error in terms of T. 

(c) Find the value of T that minimizes the probability of error. 

Let Up, U;,... be a sequence of independent zero-mean, unit-variance Gaussian ran- 

dom variables. A “low-pass filter” takes the sequence U; and produces the output 

sequence X,, = (U, + U,-1)/2, and a “high-pass filter” produces the output sequence 

Y, = (Un at U,-1)/2. 

(a) Find the joint pdf of X,, and X,,_1; of X, and Xj4,m > 1. 

(b) Repeat part a for Y,,. 

(c) Find the joint pdf of X, and Y,,. 


Vector Random 
Variables 


6.1 


CHAPTER 


In the previous chapter we presented methods for dealing with two random variables. 
In this chapter we extend these methods to the case of n random variables in the fol- 
lowing ways: 


By representing n random variables as a vector, we obtain a compact notation for 
the joint pmf, cdf, and pdf as well as marginal and conditional distributions. 

We present a general method for finding the pdf of transformations of vector ran- 
dom variables. 

Summary information of the distribution of a vector random variable is provided 
by an expected value vector and a covariance matrix. 

We use linear transformations and characteristic functions to find alternative 
representations of random vectors and their probabilities. 

We develop optimum estimators for estimating the value of a random variable 
based on observations of other random variables. 

We show how jointly Gaussian random vectors have a compact and easy-to-work- 
with pdf and characteristic function. 


VECTOR RANDOM VARIABLES 


The notion of a random variable is easily generalized to the case where several quanti- 
ties are of interest. A vector random variable X is a function that assigns a vector of 
real numbers to each outcome ¢ in S, the sample space of the random experiment. We 
use uppercase boldface notation for vector random variables. By convention X is a col- 
umn vector (n rows by 1 column), so the vector random variable with components 
X,, X7,..., Xn corresponds to 
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where “!” denotes the transpose of a matrix or vector. We will sometimes write 
X = (X1, X2,..., Xn) to save space and omit the transpose unless dealing with matri- 
ces. Possible values of the vector random variable are denoted by x = (x1, X2,..., Xn) 
where x; corresponds to the value of X;. 


Example 6.1 Arrivals at a Packet Switch 


Packets arrive at each of three input ports of a packet switch according to independent Bernoulli 
trials with p = 1/2. Each arriving packet is equally likely to be destined to any of three output 
ports. Let X = (X1, X2, X3) where X; is the total number of packets arriving for output port i. 
X is a vector random variable whose values are determined by the pattern of arrivals at the 
input ports. 


Example 6.2 Joint Poisson Counts 


A random experiment consists of finding the number of defects in a semiconductor chip and identi- 
fying their locations. The outcome of this experiment consists of the vector £ = (n, Y1, Y2,---5 Yn), 
where the first component specifies the total number of defects and the remaining components 
specify the coordinates of their location. Suppose that the chip consists of M regions. Let 
N,(£), No(Z),..-, Nu(f) be the number of defects in each of these regions, that is, N,(Z) is the 
number of y’s that fall in region k. The vector N(¢) = (N1, N2,..., Ny) is then a vector random 
variable. 


Example 6.3 Samples of an Audio Signal 


Let the outcome ¢ of a random experiment be an audio signal X(t). Let the random variable 
Xp = X(kT) be the sample of the signal taken at time kT. An MP3 codec processes the audio in 
blocks of n samples X = (X1, X>,..., Xn). X is a vector random variable. 


Events and Probabilities 


Each event A involving X = (X,, X,..., Xn) has a corresponding region in an n- 
dimensional real space R”. As before, we use “rectangular” product-form sets in R” 
as building blocks. For the n-dimensional random variable X = (X1, X2,..., Xn) 
we are interested in events that have the product form 


A= {X, in Ay} N{X in Ar}: OLX, in An}, (6.1) 


where each A, is a one-dimensional event (i.e., subset of the real line) that involves X% 
only. The event A occurs when all of the events {X;, in Ag} occur jointly. 
We are interested in obtaining the probabilities of these product-form events: 


P[A] = P[XeA] = P[{X in Ap} N {Xz in A3} N--N{X, in A, }] 


P(X, in A,, X% in A,..., X, in A,]. (6.2) 


l> 
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In principle, the probability in Eq. (6.2) is obtained by finding the probability of the 
equivalent event in the underlying sample space, that is, 


P[A] = P[{¢in S:X(£) in A}] 
= P[{¢ in S: Xı(¢) e A1, X2(¢) E€ Á2,..., Xn(¢) eA,}'; (6.3) 


Equation (6.2) forms the basis for the definition of the n-dimensional joint probability 
mass function, cumulative distribution function, and probability density function. The 
probabilities of other events can be expressed in terms of these three functions. 


Joint Distribution Functions 


The joint cumulative distribution function of X,, X>,..., X, is defined as the probabil- 
ity of an n-dimensional semi-infinite rectangle associated with the point (x,,..., Xn): 


F(x) = Fy, Xa.. OY Micon Xn) = P[X S x1, X S x2,..., Xn S Xn] (6.4) 


The joint cdf is defined for discrete, continuous, and random variables of mixed type. 
The probability of product-form events can be expressed in terms of the joint cdf. 

The joint cdf generates a family of marginal cdf’s for subcollections of the ran- 
dom variables X,,..., X„. These marginal cdf’s are obtained by setting the appropri- 
ate entries to +œ in the joint cdf in Eq. (6.4). For example: 


Joint cdf for X,,..., X,-1 is given by Fy, xy, v,(%1, X25- - -> Xn-1, 00) and 
Joint cdf for X; and X; is given by Fy, x,,...,x,(X1, X2, 00, . . . , 00). 
Example 6.4 


A radio transmitter sends a signal to a receiver using three paths. Let X,, X72, and X; be the sig- 
nals that arrive at the receiver along each path. Find P[max( X1, X2, X3) = 5]. 

The maximum of three numbers is less than 5 if and only if each of the three numbers is 
less than 5; therefore 


PLA] = P[{X, = 5} N{X, = 5} N{X = 5}] 
— Fx, x,x,(5, 5, 5). 
The joint probability mass function of n discrete random variables is defined by 
px(x) ê BHR, bie XX1 X230 Xn) = P[X = x1, X = X2,..., Xn = Xn]. (6.5) 


The probability of any n-dimensional event A is found by summing the pmf over the 
points in the event 


P[X in A] = > oa D> px ae X, (X1; X25- Xn): (6.6) 
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The joint pmf generates a family of marginal pmf’s that specifies the joint proba- 
bilities for subcollections of the n random variables. For example, the one-dimensional 
pmf of X; is found by adding the joint pmf over all variables other than x;: 


Pela) = PG aS De DS px... x X2 Xn) (6.7) 


Xj-1Xj+1 


The two-dimensional joint pmf of any pair X; and X, is found by adding the joint pmf 


over all n — 2 other variables, and so on. Thus, the marginal pmf for X;,..., Xn-1 is 
given by 
Pits J Mea Xn-1) = X Px, x, (X X2- Xn) (6.8) 
Xn 


A family of conditional pmf’s is obtained from the joint pmf by conditioning 
on different subcollections of the random variables. For example, if Px,,..., x 
(x4,.--, Xn-1) > 0: 


n-1 


PX tg Reig en 


(6.9a) 


Px (Xn | X1500+5%n=i) = ; 
a ANa l Aiaste Xni) 1p EEA. E E EEEE eer | 


Repeated applications of Eq. (6.9a) yield the following very useful expression: 
PX., X, X19 Xn) E 


Px (Xn X1,- -5 Xn—1)Px,_,(%n-11%1, EE =a) Ee Px,(Xol xı)px (x1). (6.9b) 


Example 6.5 Arrivals at a Packet Switch 


Find the joint pmf of X = (X1, X2, X3) in Example 6.1. Find P| X; > X3]. 
Let N be the total number of packets arriving in the three input ports. Each input port has 
an arrival with probability p = 1/2, so N is binomial with pmf: 


3\1 
paon) = ( E for OSn S3. 
nj2 


Given N = n, the number of packets arriving for each output port has a multinomial distribution: 


n! 1 
—— — fi i+j+k= i=>0;,=0k =0 
Py x xli jklitj+k=n) = {ijik 37 OT Pd aaa dae 


0 otherwise. 


The joint pmf of X is then: 
oy Fa 3\1 ; : AET 
px(i, j,k) = px(i, j, k| n) 5 for i=0,j=0,k=0,i+j+k=ns3. 
n}23 


The explicit values of the joint pmf are: 


sae o 1f3\1 1 
ERUS? 0101o: 3°\0/23 8 
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1! 1/3\1 3 


2! 1/3\1 6 


px(2,0,0) = px(0,2,0) = px(0, 0,2) = 3/72 
Px(1, 1,1) = 6/216 
Px(0, 1,2) = px(0,2,1) = px(1, 0,2) = px(1, 2,0) = px(2, 0,1) = px(2, 1,0) = 3/216 
DPx(3, 0,0) = px(0, 3,0) = px(0, 0,3) = 1/216. 
Finally: 
P[X, > X3] = px(1,0,0) + px(1, 1,0) + px(2,0,0) + px(1, 2,0) 
+ px(2,0,1) + px(2,1,0) + px(3, 0, 0) 
= 8/27. 
We say that the random variables X1, X2,..., X,, are jointly continuous random 


variables if the probability of any n-dimensional event A is given by an n-dimensional 
integral of a probability density function: 


P[X in A] =}... cece ts xn) dx1...dxi,, (6.10) 
xinA 


where fy, v,(%1,---, Xn) is the joint probability density function. 
The joint cdf of X is obtained from the joint pdf by integration: 


x1 Xn 
F(x) = be OR ACC ETS, = 1 epee fxn., XX Xn) dxi... dxXh. 
(6.11) 
The joint pdf (if the derivative exists) is given by 
g” 
fx(x) 4 PG Kp ig X (X1 X23- -3 Xn) = Bae Ban Kir Kaho a (6.12) 


A family of marginal pdf’s is associated with the joint pdf in Eq. (6.12). The mar- 
ginal pdf for a subset of the random variables is obtained by integrating the other 
variables out. For example, the marginal pdf of X; is 


fea) = f -f XXa., X (X1 As oy Xn) dx3...dXp. (6.13) 


As another example, the marginal pdf for X4,..., Xn-1 is given by 


Fko Xa aA Aa) = / fx., Miah Xn-1 Xn) Oe (6.14) 
A family of conditional pdf’s is also associated with the joint pdf. For example, 
the pdf of X, given the values of X1,..., X,,-1 is given by 
fxi.. (sé Xn) 


fx, a E EITEN) 


fx, (Xal Xis. Xn) = (6.15a) 
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MP cn. Wh Ries oa Xn-1) > 0. 

Repeated applications of Eq. (6.15a) yield an expression analogous to Eq. (6.9b): 
Pix ies: s Xn) = (6.15b) 
fx, (Xn | Xise’ Xn-1) fx, (Xn-1 | Xis.. Nay 9) + . fx (x2 | xı)fx (x1). 

Example 6.6 
The random variables X4, X2, and X; have the joint Gaussian pdf 


y AARE ita? 
ei +x3-V2 x1x9+1/) x3) 


fx ,.x,X3(%1, X2, X3) = mVn 


Find the marginal pdf of X, and X3. Find the conditional pdf of X; given X; and X3. 
The marginal pdf for the pair X, and X; is found by integrating the joint pdf over x3: 


e™ y2 pœ e (xi txi- V2x1x2) 
/ dx. 


fx, (x1, X3) = 
DESP RY a fe NA 


The above integral was carried out in Example 5.18 with p = -1/V2. By substituting the result 
of the integration above, we obtain 


—x2/2 „—x?/2 


_e e 
fx,,x,(%1, x3) = Vin Vim 


Therefore X, and X; are independent zero-mean, unit-variance Gaussian random variables. 
The conditional pdf of X, given X; and X; is: 


| et VD 1/288) \/ I \/ Dat 
X2| X1, X E 
fx,(x2| x1, x3) 2r VT 


e™%!2 e12 
e C/i tx = V2x1x2) eT (e201)? 
Vr Vr 


We conclude that X, given and X; is a Gaussian random variable with mean xa/V2 and 
variance 1/2. 


Example 6.7 Multiplicative Sequence 


Let X; be uniform in [0, 1], X, be uniform in [0, X; ], and X; be uniform in [0, X7]. (Note that X; 
is also the product of three uniform random variables.) Find the joint pdf of X and the marginal 
pdf of X3. 
For 0 <z < y < x < 1, the joint pdf is nonzero and given by: 
11 1 


frx,,X,X(%1) X2, x3) = pata y) fx (yx) fx,(x) F Ja = xy’ 


6.1.3 


6.2 
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The joint pdf of X, and X; is nonzero for 0 < z < y < 1 and is obtained by integrating x be- 
tween y and 1: 


‘1 Ea Ale 
fx,,x,(X2, X3) = i dx =-—Inx| =—In 
` y XY y b y y 
We obtain the pdf of X3 by integrating y between z and 1: 
l 1 
1 1 1 
= Inydy = In y)?| = =(In z}. 
fx,(%3) ie n ydy 5 (In y) ; 5 (In z) 


Note that the pdf of X; is concentrated at the values close to x = 0. 


Independence 
The collection of random variables X,,..., X,, is independent if 


P(X, in Ay, Xp in Ay,..., X in A,] = P[X, in A,]P[X) in Ay]... PLX, in Ay] 


for any one-dimensional events A,,...,A,,. It can be shown that X),..., X„ are inde- 
pendent if and only if 
Fy, X (819-609 Xn) = Fy, (01). Fry, (%n) (6.16) 
for all x,,..., x„. If the random variables are discrete, Eq. (6.16) is equivalent to 
Bc A Mo Xn) = px (x1) . <- Px, (Xn) for all x1,..., Xp. 


If the random variables are jointly continuous, Eq. (6.16) is equivalent to 


fx., x,a; aya) fx,(%1) -fx (Xn) 


for all x1,..., Xn- 


Example 6.8 
The n samples X1, X2,..., X„ of a noise signal have joint pdf given by 


ett. ..+x2)/2 


ny for all x1,..., Xn- 


fx., x, aa = 
It is clear that the above is the product of n one-dimensional Gaussian pdf’s. Thus X4,..., X„ are 
independent Gaussian random variables. 


FUNCTIONS OF SEVERAL RANDOM VARIABLES 


Functions of vector random variables arise naturally in random experiments. For ex- 
ample X = (X1, X2,..., X„) may correspond to observations from n repetitions of an 
experiment that generates a given random variable. We are almost always interested in 
the sample mean and the sample variance of the observations. In another example 
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X = (Xi, X2,..., Xn) may correspond to samples of a speech waveform and we may 
be interested in extracting features that are defined as functions of X for use in a 
speech recognition system. 


One Function of Several Random Variables 
Let the random variable Z be defined as a function of several random variables: 
Z = g(X1, Xp., Xp). (6.17) 


The cdf of Z is found by finding the equivalent event of {Z < z}, that is, the set 
R, = {x: g(x) = z}, then 


F,(z) = P[Xin R,] = J 


x 


ee [fences dx...dx). (6.18) 
in R; 


The pdf of Z is then found by taking the derivative of F7(z). 


Example 6.9 Maximum and Minimum of n Random Variables 


Let W = max(X), Xo,..., Xn) and Z = min(X1, X2,..., Xn), where the X; are independent 
random variables with the same distribution. Find Fy(w) and F(z). 
The maximum of X,, X>,..., Xn is less than x if and only if each X; is less than x, so: 


Fy(w) = P|max( X1, X2,..., Xn) = w] 
= P[X, < w|P[X. = w]... PL X, = w] = (Fy(w))”. 
The minimum of X1, X>,..., X, is greater than x if and only if each X; is greater than x, so: 
1 — F(z) = Plmin(X,, Xz,...,X,) > z] 
= P[X, > z]P[X2 > z]...P[X, > z] = (1 — Fx(z))” 
and 
Fz(z) = 1 - (l= Fx(z))". 


Example 6.10 Merging of Independent Poisson Arrivals 


Web page requests arrive at a server from n independent sources. Source j generates packets 
with exponentially distributed interarrival times with rate Aj. Find the distribution of the inter- 
arrival times between consecutive requests at the server. 

Let the interarrival times for the different sources be given by X1, X2,..., Xn. Each X; 
satisfies the memoryless property, so the time that has elapsed since the last arrival from each 
source is irrelevant. The time until the next arrival at the multiplexer is then: 


Z= min( Xj, X3, Saya Xn): 
Therefore the pdf of Z is: 


1- F;(z) = P{min(X,, X5,..., Xn) ee z] 
= P[X, > z]P[X > z]...P[X, > z] 


6.2.2 
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(1 - Fy,(z))(1 - Fy,(z))...(1 - Fx,(z)) 


Azz oe Anz = eTM FAE EAn) 


ÀZ e 


i e 


The interarrival time is an exponential random variable with rate Ay + Ay +: + Ay. 


Example 6.11 Reliability of Redundant Systems 


A computing cluster has n independent redundant subsystems. Each subsystem has an exponen- 
tially distributed lifetime with parameter à. The cluster will operate as long as at least one sub- 
system is functioning. Find the cdf of the time until the system fails. 

Let the lifetime of each subsystem be given by X1, X2,..., X,,. The time until the last sub- 
system fails is: 


W = max(X), Xo,..., Xn). 
Therefore the cdf of W is: 


Fy(w) = (Fy(w))" = (1 - en = 1 ("em + (Jee Pc 


Transformations of Random Vectors 


Let X,,..., X, be random variables in some experiment, and let the random vari- 
ables Z,,...,Z, be defined by a transformation that consists of n functions of 
X = (X,..., X,): 

Z=g8(X) 2=8(X) ~ Zn = Sn(X). 
The joint cdf of Z = (Z,,..., Z„) at the point z = (z,,..., Z,) is equal to the probabil- 
ity of the region of x where g,(x) = z, fork =1,...,m: 

Fz... z211» Mey Zn) = Pl g(X) SZ ees gn(X) = Zal: (6.19a) 


If X,,..., Xn have a joint pdf, then 


E70 7A S EEA) = S-S fxi., X Xn) dxi... dx'. (6.19b) 


x"gg(x')=Zk 


Example 6.12 
Given a random vector X, find the joint pdf of the following transformation: 
Z, = &(Xı) = 4X, + by, 
Zp = 8(X2) = mX + by, 
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Note that Zk = aX, = bk, = ZE5 if and only if Xk = (zk T by )lag, if k > 0, SO 


zı — by 22 — b Zn — bn 
FZZ, ..., Z,(Z1s 2290009 Zn) = P| XS X% S LG = 
ay a2 an 
zı — b z2 — by Zn — dy 
aD ate comer © > PiS 
ay a2 An 
g” 
ÍZ Za... , ZZ 225- -+3 Zn) = 3z... ðZ F22, Z(Z19 225 s Zn) 
1 Z1 bı z2: — by Za = bn 
= fx.Xx, x; > grees 
ay a ay ag an 


*6.2.3 pdf of General Transformations 


We now introduce a general method for finding the pdf of a transformation of n jointly 
continuous random variables. We first develop the two-dimensional case. Let the ran- 
dom variables V and W be defined by two functions of X and Y: 


V=2(X,Y) and W= (X,Y). (6.20) 


Assume that the functions v(x, y) and w(x, y) are invertible in the sense that the equa- 
tions v = g(x, y) and w = g(x, y) can be solved for x and y, that is, 


x = hy(v, w) and y = hy(v, w). 


The joint pdf of X and Y is found by finding the equivalent event of infinitesimal rec- 
tangles. The image of the infinitesimal rectangle is shown in Fig. 6.1(a). The image can be 
approximated by the parallelogram shown in Fig. 6.1(b) by making the approximation 


a 
B(x + dx, y) =~ g(x, y) + 9x okie y)dx k=1,2 


and similarly for the y variable. The probabilities of the infinitesimal rectangle and the 
parallelogram are approximately equal, therefore 


fry (x, y) dx dy = fvw, w) dP 
and 
fxy(Ai(v, w), (ha(v, w)) 


dP 
dxdy 


fvw(v, w) = , (6.21) 


where dP is the area of the parallelogram. By analogy with the case of a linear 
transformation (see Eq. 5.59), we can match the derivatives in the above approxi- 
mations with the coefficients in the linear transformations and conclude that the 
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(g(x + dx, y + dy), go(x + dx, y + dy)) 


(gi(% y + dy), 


œx y+ dy) (x+ dx, y+ dy) gx, y + dy)) 


(gi(@ + dx, y), 8a(x + dx, y)) 


œ% y) (x + dx, y) (1y), g y) 
z 
(a) 
981, , 981 982, , 982 
Wh (uta dx + ay dy, w + ax dx + ay dy) 


9g 989 
(v + Gy D w tay 2 


(v, w) 

=Q 
v= 81% y) 
w = gx, y) 


(b) 


FIGURE 6.1 
(a) Image of an infinitesimal rectangle under general transformation. (b) Approximation of image by a parallelogram. 


“stretch factor” at the point (v, w) is given by the determinant of a matrix of partial 
derivatives: 


av aw 
ox oy 

J(x, = det 

(x, y) aw aw 


ox oy 
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The determinant J(x, y) is called the Jacobian of the transformation. The Jacobian of 
the inverse transformation is given by 


ax ax 
ov Ow 
J(v, w) = det ay ay 
ov Ow 
It can be shown that 
1 
|J(v, w)| = ——. 
lJ (x, y)| 


We therefore conclude that the joint pdf of V and W can be found using either of the 
following expressions: 


fxy(h(v, w), (h(v, w)) 
|J (x, y)| 
= fx y(h(v, w), (ho(v, w))|J (v, w)|. (6.22b) 


fyw, w) = (6.22a) 


It should be noted that Eq. (6.21) is applicable even if Eq. (6.20) has more than 
one solution; the pdf is then equal to the sum of terms of the form given by Eqs. (6.22a) 
and (6.22b), with each solution providing one such term. 


Example 6.13 


Server 1 receives m Web page requests and server 2 receives k Web page requests. Web page trans- 
mission times are exponential random variables with mean 1/. Let X be the total time to transmit 
files from server 1 and let Y be the total time for server 2. Find the joint pdf for T, the total trans- 
mission time, and W, the proportion of the total transmission time contributed by server 1: 


o Xx 
xXx+Y 
From Chapter 4, the sum of j independent exponential random variables is an Erlang ran- 


dom variable with parameters j and u. Therefore X and Y are independent Erlang random vari- 
ables with parameters m and u, and k and yp, respectively: 


T=X+Y and 


ey k-1 
and fry) = 


pe (ux) 
~ (m=1)! 
We solve for X and Y in terms of T and W: 


X=TW and Y=T(1-W). 


fx(x) 


The Jacobian of the transformation is: 


1 1 
J(x, y) = det y s 
(x+y? (x+y? 
=ý y —1 —1 
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The joint pdf of T and W is then: 


r TE 1 koe acu X 
as Wæ (m=) (k= D! aa 
pE (ptw)"™ peH ut(1 — w) 7 
(m — 1)! (k — 1)! 


pe (pty tel (m+k-— 1)! as S 
Sek ac e s d 


We see that T and W are independent random variables. As expected, T is Erlang with parame- 
ters m + k and p, since it is the sum of m + k independent Erlang random variables. W is the 
beta random variable introduced in Chapter 3. 


The method developed above can be used even if we are interested in only one 
function of a random variable. By defining an “auxiliary” variable, we can use the 
transformation method to find the joint pdf of both random variables, and then we can 
find the marginal pdf involving the random variable of interest. The following example 
demonstrates the method. 


Example 6.14 Student's t-distribution 


Let X be a zero-mean, unit-variance Gaussian random variable and let Y be a chi-square random 
variable with n degrees of freedom. Assume that X and Y are independent. Find the pdf of 


V = X/VYIn. 


Define the auxiliary function of W = Y. The variables X and Y are then related to V and W by 


X =VVW/in and Y = W. 
The Jacobian of the inverse transformation is 


HOE Vwin (vi2)vwn| _ TE 
0 1 
Since fy y(x, y) = fx(x)fy(y), the joint pdf of V and W is thus 
—=x?/2 ( [2y 
e y 
fv w(u, w) ieg \/ dn 2T (n/2) lJ (v, w)| : > oV wn 


(00/2) "Wel (w!2)(1+02in)] 


2V nal (n12) 


The pdf of V is found by integrating the joint pdf over w: 


1 


~ Vaal (n12) 


If we let w' = (w/2)(v’/n + 1), the integral becomes 


Fa win —(n+1)/2 poo 
fyw) = ( ) Í (w) Dew dw'. 


V nal (n2) 


fyw) I (w2) D/Z eel) dw. 
0 
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By noting that the above integral is the gamma function evaluated at (n + 1)/2, we finally obtain 
the Student's t-distribution: 


O (4 + ny er es 1)/2) 
NE V nal (n2) 


This pdf is used extensively in statistical calculations. (See Chapter 8.) 


Next consider the problem of finding the joint pdf for n functions of n random 
variables X = (Xj,..., Xn): 


Zi = &(X), Z2 = 8(X),---, Zn = 8n(X). 
We assume as before that the set of equations 
Z1 = 81(X), Z2 = g(x) -> Zn = Bnl(X). (6.23) 
has a unique solution given by 
xX, = h(x), x2 = ho(x),..., Xn = h,(x). 


The joint pdf of Z is then given by 
fx, ads x,(A1(Z), A2(Z),---, An(Z)) 


Z1> > Zn 6.24a 

fz, Jetas Z,( 1 ) [J (x1, X2,-+-5 Xn) | ( ) 
= fr,,....x,(M(Z), h,(z), eeey h,(z)) [J (z1, Z25++ +9 Za) (6.24b) 
where |J(x1,..., Xn)| and |J(z1,.--,Zn)| are the determinants of the transformation 


and the inverse transformation, respectively, 


DRE l, eli 

Ox, Xn 
J(x1,.-., Xn) = det| : : 

O8n O8n 

and 

ah, oh, 
J(Z,..+5Zn) = det} : : 

ðh, ðh, 


0Z1 Zn 
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In the special case of a linear transformation we have: 


ayy a2 iea Ain Xı 

a2 an a Xz 
Z = AX = i 

ani an2 Ann Xn 


The components of Z are: 


Zj = 4jiXı + aj2X2 a Ea E AinXn- 


Since dz;/dx; = aj, the Jacobian is then simply: 


411 an Ain 

a, an feasts ay 
J(X1,%,---,X,) = det ” | = det A. 

Ant Am ++» Ann 


Assuming that A is invertible,’ we then have that: 


z fx(x) _ fx(A"z) 
Idet Al|x=a+z [det Al ` 


falz) 


Example 6.15 Sum of Random Variables 
Given a random vector X = (X1, X2, X3), find the joint pdf of the sum: 
Z=X+X 4+ X3. 
We will use the transformation by introducing auxiliary variables as follows: 
Z, = X1,Z, = X, + Xo, 2, = X, + X + X. 
The inverse transformation is given by: 
X% = Z,, X= Z, — Z, X3 = Z- LZ. 

The Jacobian matrix is: 

1 0 0 

J(x1,%2,%3) = det] 1 1 Of} =1. 

1 1 1 

Therefore the joint pdf of Z is 


fz(z1; Z2; Z3) = fx(z1; Z2 — 215 Z3 — 22). 
The pdf of Z} is obtained by integrating with respect to z4 and z3: 


fz (2) = ih TEGE — Z1; Z — z2) dzıdz2. 


—00 —00 


This expression can be simplified further if X,, X2, and X; are independent random variables. 


1 Appendix C provides a summary of definitions and useful results from linear algebra. 
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EXPECTED VALUES OF VECTOR RANDOM VARIABLES 


In this section we are interested in the characterization of a vector random variable 
through the expected values of its components and of functions of its components. We 
focus on the characterization of a vector random variable through its mean vector and 
its covariance matrix. We then introduce the joint characteristic function for a vector 
random variable. 

The expected value of a function g(X) = g(X),..., Xn) of a vector random vari- 
able X = (X1, X2,..., Xn) is given by: 


J EA i g(xX1, X2,---, Xn) fx( x1, X2,---, Xn) dX, dx2...dx, X jointly 
—00 —00 continuous 


X SO X27. Xn) Pe X23- -3 Xn) X discrete. 
e (6.25) 


An important example is g(X) equal to the sum of functions of X. The procedure 
leading to Eq. (5.26) and a simple induction argument show that: 


Elgi(X) + g(X%) + © + 8a(X)] = Elgi(X)] + ©- + Els(X)]. (6.26) 


Another important example is g(X) equal to the product of n individual functions of 
the components. If X4,..., X, are independent random variables, then 


El g1(X1)82(X2) --- 8n( Xn) ] = Ela (X) JEL e(X2)] --- El 8n( Xn) J. (6.27) 


Mean Vector and Covariance Matrix 


The mean, variance, and covariance provide useful information about the distribu- 
tion of a random variable and are easy to estimate, so we are frequently interested 
in characterizing multiple random variables in terms of their first and second mo- 
ments. We now introduce the mean vector and the covariance matrix. We then in- 
vestigate the mean vector and the covariance matrix of a linear transformation of a 
random vector. 

For X = (X,, X2,..., Xn), the mean vector is defined as the column vector of 
expected values of the components X;: 


x E[X] 
mx = E[X] = E a a} F [%2] (6.28a) 
Xai LEPE 


Note that we define the vector of expected values as a column vector. In previous sec- 
tions we have sometimes written X as a row vector, but in this section and wherever we 
deal with matrix transformations, we will represent X and its expected value as a col- 
umn vector. 
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The correlation matrix has the second moments of X as its entries: 


E[X{] E[XX] ... EL XX, ] 
Ry = | EDX] EXA o EDX] ea 
E[X,X;] Eo e B 
The covariance matrix has the second-order central moments as its entries: 
E[(X;—m)’] E| (Xi-m)(X%27m)] ... E[(X; — m4) (Xp — m,)] 

Kx = E[ (X2 — m)(Xı-mı)] El (X: -m)’] - E[(X2 — m)(Xn — m,)] l 

E| (Xn =m) (Xı = mı)] E| (X, mn) (X2 — mz) | tee E| (Xn E Mn)” | 
(6.28c) 


Both Rx and Ky are n X n symmetric matrices. The diagonal elements of Ky are 
given by the variances VAR[_X;] = E[(X; — m,)’] of the elements of X. If these ele- 
ments are uncorrelated, then COV(X;, X;,) = 0 for j # k, and Kx is a diagonal ma- 
trix. If the random variables X,..., X, are independent, then they are uncorrelated 
and Kx is diagonal. Finally, if the vector of expected values is 0, that is, m, = E[X;,] = 0 
for all k, then Rx = Kx. 


Example 6.16 


Let X = (X,, X2, X3) be the jointly Gaussian random vector from Example 6.6. Find E[X] and Kx. 
We rewrite the joint pdf as follows: 


eC? +x ~235%1%2) eo 83/2 
frx,.x,,x,(%1, X2, X3) = : 
1 \2 Vm 

27,/1-—| - —= 

V2 


We see that X; is a Gaussian random variable with zero mean and unit variance, and that it is in- 
dependent of X; and X2. We also see that X; and X; are jointly Gaussian with zero mean and 
unit variance, and with correlation coefficient 


1 COV(X, X2) 


= = V(X, X). 
PX,X> V2 ox,Cx, CO ( 1> 2) 
Therefore the vector of expected values is: mx = 0, and 
1 
1 == 0 
V2 
Kx = 1 . 
= 1 0 
V2 
0 0 1 
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We now develop compact expressions for Rx and Kx. If we multiply X, ann x 1 
matrix, and X',a1 x n matrix, we obtain the following n X n matrix: 


Xx KE XX wes, XG 

X: XX ING Ae. AX, 
XX S| A ee a AAS 2 RAEI, 

X, XX Xs. ase A 


If we define the expected value of a matrix to be the matrix of expected values of the 
matrix elements, then we can write the correlation matrix as: 


Rx = E[XXT]. (6.29a) 
The covariance matrix is then: 


Kx = E[(X - mx)(X — mx)"] 
= E[XX"] — mx E[X"] — E[X]mX! + mym 
= Ry — mymy!. (6.29b) 


Linear Transformations of Random Vectors 


Many engineering systems are linear in the sense that will be elaborated on in Chapter 
10. Frequently these systems can be reduced to a linear transformation of a vector of 
random variables where the “input” is X and the “output” is Y: 


ayy a2 ark ay Xı 
d1 an BEN ay X> 

Y= 4 < | = AX. 
Qn. ann ++» Ann ||_ Xn 


The expected value of the Ath component of Y is the inner product (dot product) of the 
kth row of A and X: 


E| Y;] _ Hf Saux, = S ayElX;] 
J= J= 


Each component of E[Y] is obtained in this manner, so: 


n 

j= 

n an an an || EL Xi] 
my = E[Y] = S vFlX)] |32 an2 ~ An BIAG] 

n ani an2 Ann E| X,] 

j=1 


= AE[|X] = Amx. (6.30a) 
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The covariance matrix of Y is then: 


Ky = E[(Y — my)(Y - my)"] = E[(AX — Amx)(AX — Amx)"] 
= E[A(X - mx)(X — mx)"A"] = AE[(X — mx)(X — mx)"]AT 
= AKyAl, (6.30b) 
where we used the fact that the transpose of a matrix multiplication is the product of 


the transposed matrices in reverse order: {A(X — mx)}" = (X — mx)‘A’. 
The cross-covariance matrix of two random vectors X and Y is defined as: 


Kxy = E[(X — mx)(Y - my)"] = E[XY"] = mxmy' = Rxy — mxmy’. 
We are interested in the cross-covariance between X and Y = AX: 
Kxy = E[X — mx)(Y - my)"] = E| (X — mx)(X — mx)'A™] 
= KyAl. (6.30c) 


Example 6.17 Transformation of Uncorrelated Random Vector 


Suppose that the components of X are uncorrelated and have unit variance, then Kx = I, the 
identity matrix. The covariance matrix for Y = AX is 


Ky = AKA!’ = AIA! = AA’. (6.31) 


In general Ky = AAT is not a diagonal matrix and so the components of Y are correlated. In 
Section 6.6 we discuss how to find a matrix A so that Eq. (6.31) holds for a given Ky. We can 
then generate a random vector Y with any desired covariance matrix Ky. 


Suppose that the components of X are correlated so Kx is not a diagonal matrix. 
In many situations we are interested in finding a transformation matrix A so that 
Y = AX has uncorrelated components. This requires finding A so that Ky = AKxA™ 
is a diagonal matrix. In the last part of this section we show how to find such a ma- 
trix A. 


Example 6.18 Transformation to Uncorrelated Random Vector 


Suppose the random vector X,, X2, and X; in Example 6.16 is transformed using the matrix: 


1 1 
eee 4 SEED I) 
V2 v2 

Aa AE m g 
V A/S 
0 0 1 


Find the E[Y] and Ky. 
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Since my = 0, then E[Y] = Amy = 0. The covariance matrix of Y is: 


1 


1 -— 0 
1 1 0 1 1 0 
w V2 
Ky=AKxAT=5/1 -1 0] 1 | oe -1 0 
0 0 1 VA 0 0 1 
0 0 1 
1 1 1 
1 14 0 1 0 0 
dle ar oe v2 v2 v2 
Fae SE Opat (1++) of 0 1+ o'l 
0 0 1 V2 V2 V2 
0 0 1 0 0 1 


The linear transformation has produced a vector of random variables Y = (Y,, Y2, Y3) with 
components that are uncorrelated. 


Joint Characteristic Function 
The joint characteristic function of n random variables is defined as 
D xxo., X, (01, O25 0-05 On) = Efell itoat Honn), (6.32a) 


In this section we develop the properties of the joint characteristic function of two ran- 
dom variables. These properties generalize in straightforward fashion to the case of n 
random variables. Therefore consider 


Py y(@1, @2) = Eleto], (6.32b) 


If X and Y are jointly continuous random variables, then 


®xy(o1,e:) = f [faves yee dx dy. (6.32c) 


Equation (6.32c) shows that the joint characteristic function is the two-dimensional 
Fourier transform of the joint pdf of X and Y. The inversion formula for the Fourier 
transform implies that the joint pdf is given by 


1 o0 oo p 
fxy(x, y) = ee: = . / Dy y(@1, ar)e Horx*y) dey, dw. (6.33) 


Note in Eq. (6.32b) that the marginal characteristic functions can be obtained from 
joint characteristic function: 
®y(w) = Pyy(w,0) Py(w) = By y(0,). (6.34) 


If X and Y are independent random variables, then the joint characteristic function is 
the product of the marginal characteristic functions since 


P xylo, œ) = Eleltte¥)] = ElelXeler®] 
= Ele Eel] = ®y(a1)®y(w»), (6.35) 
where the third equality follows from Eq. (6.27). 
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The characteristic function of the sum Z = aX + bY can be obtained from the 
joint characteristic function of X and Y as follows: 


7 (w) = EfeleX*OY)] = EelloaXtoY)) = Oy y(aw, bw). (6.36a) 
If X and Y are independent random variables, the characteristic function of Z = aX + bY 


is then 
®,(w) == Dy y(ao, bo) = ® (aw) (bw). (6.36b) 


In Section 8.1 we will use the above result in dealing with sums of random variables. 

The joint moments of X and Y (if they exist) can be obtained by taking the de- 
rivatives of the joint characteristic function. To show this we rewrite Eq. (6.32b) as the 
expected value of a product of exponentials and we expand the exponentials in a 
power series: 


Py yla, w2) = Efe” eien¥ | 


It then follows that the moments can be obtained by taking an appropriate set of de- 
rivatives: 


EE TOR] (637) 

: w1, W2) |o, =0.0%=0" 5 

TE awhacok XY(1, 2) |a,=0,0,=0 

Example 6.19 

Suppose U and V are independent zero-mean, unit-variance Gaussian random variables, and let 
X=U+V Y=2U +V. 


Find the joint characteristic function of X and Y, and find ELXY]. 
The joint characteristic function of X and Y is 


®y yla ; >) = Rl eix re) = El ell V)eionQQu+V)) 
= EJ el (ort2an)0+(ortan)V)), 
Since U and V are independent random variables, the joint characteristic function of U and V is 
equal to the product of the marginal characteristic functions: 
By y(@, @) = E[ el (ort 2020) Ef eillor+o)V)] 
= By (a + 20) Py(a + a) 


1 1 
2 eo 2(@1t 20) 070 t)? 


1 2 2 
— pi~ 7(2@7 +60,02+503 ) 
=e{ ? }. 


where marginal characteristic functions were obtained from Table 4.1. 
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The correlation ELXY] is found from Eq. (6.37) with i = 1 and k = 1: 


1 8 


E[XY] = >> 
pee j? wð 


®yy(o, w2) loi =0,0)=0 


1 
= —exp|-}(2w? + bww + 5w3)}[6m, 4 10w] ide + 60> | 


+ —exp{—3(20f + 60022 + 53)}[6]|,=0.02=0 
= 3. 
You should verify this answer by evaluating E[ XY] = E[(U + V)(2U + V)] directly. 


*6.3.4 Diagonalization of Covariance Matrix 


Let X be a random vector with covariance Ky. We are interested in finding ann X n 
matrix A such that Y = AX has a covariance matrix that is diagonal. The components 
of Y are then uncorrelated. 
We saw that Ky is a real-valued symmetric matrix. In Appendix C we state results 
from linear algebra that Ky is then a diagonalizable matrix, that is, there is a matrix P 
such that: 
P'KxP =A and P'P=I (6.38a) 


where A is a diagonal matrix and I is the identity matrix. Therefore if we let A = PT, 
then from Eq. (6.30b) we obtain a diagonal Ky. 
We now show how P is obtained. First, we find the eigenvalues and eigenvectors 
of Ky from: 
Kye; = Ae; (6.38b) 


where e; are n X 1 column vectors.’ We can normalize each eigenvector e; so that 
e;'e;, the sum of the square of its components, is 1. The normalized eigenvectors are 
then orthonormal, that is, 
1 ifisj 
T 
e e =ð. = a : 6.38c 
Booed bJ t ifi # j. ( ) 


Let P be the matrix whose columns are the eigenvectors of Kx and let A be the diago- 
nal matrix of eigenvalues: 


P = [e;,@,...,e,| A = diag| à]. 
From Eq. (6.38b) we have: 
KxP = Ky[e;, €2,...,€,] = [Kxe;, Kxez,..., Kxe,,] 
= [Aye1, Are, Eey Anen] = PA (6.39a) 


where the second equality follows from the fact that each column of KxP is obtained 
by multiplying a column of P by Ky. By premultiplying both sides of the above equa- 
tions by PT, we obtain: 

P'KyP = P'PA = A. (6.39b) 


>See Appendix C. 
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We conclude that if we let A = PT, and 
Y = AX = P'X, (6.40a) 
then the random variables in Y are uncorrelated since 
Ky = PK% = A. (6.40b) 


In summary, any covariance matrix Ky. can be diagonalized by a linear transformation. 
The matrix A in the transformation is obtained from the eigenvectors of Kx. 

Equation (6.40b) provides insight into the invertibility of Kx and Ky. From lin- 
ear algebra we know that the determinant of a product of n X n matrices is the prod- 
uct of the determinants, so: 


det Ky = det PT det Kx det P = det A = MA2... Àn, 


where we used the fact that det PT det P = det I = 1. Recall that a matrix is invertible 
if and only if its determinant is nonzero. Therefore Ky is not invertible if and only if 
one or more of the eigenvalues of Kx is zero. 

Now suppose that one of the eigenvalues is zero, say A, = 0. Since VAR[Y;] = 
Ax = 0, then Y, = 0. But Y, is defined as a linear combination, so 


0= Y, a ay Xı + Any X> peet Akn Xn 


We conclude that the components of X are linearly dependent. Therefore, one or more 
of the components in X are redundant and can be expressed as a linear combination of 
the other components. 

It is interesting to look at the vector X expressed in terms of Y. Multiply both 
sides of Eq. (6.40a) by P and use the fact that PPT = I: 


Yı 
Y- n 
X = PPTX = PY = [e),€),...,€,]| 2 | = X Yer. (6.41) 
: k=1 
Y, 


This equation is called the Karhunen-Loeve expansion. The equation shows that a random 
vector X can be expressed as a weighted sum of the eigenvectors of Kx, where the coeffi- 
cients are uncorrelated random variables Y;,. Furthermore, the eigenvectors form an ortho- 
normal set. Note that if any of the eigenvalues are zero, VAR[Y;,] = A, = 0, then Y; =0, 
and the corresponding term can be dropped from the expansion in Eq. (6.41). In Chapter 
10, we will see that this expansion is very useful in the processing of random signals. 


JOINTLY GAUSSIAN RANDOM VECTORS 


The random variables X,, X2,..., X, are said to be jointly Gaussian if their joint pdf is 
given by 


exp{—3(x — m)"K™!(x — m)} 


A 
Fx(x) = Fires Xo. X, (X1 ae “9 Xn) = (2ar)"?| K|" ’ (6.42a) 
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where x and m are column vectors defined by 


X1 mı E| X] 

_ | %2 _ | M2 | _ E| X2] 
K= 2 Me E : 

Xn My E[Xn] 


and K is the covariance matrix that is defined by 


VAR(X;)  COV(X, X) ... ~COV(X, X,) 
a COV(%s, Xi) YAROG) np cov(%, Xn) (6.428) 
COV(X,,, X1) a VAR(X,) 


The (.)! in Eq. (6.42a) denotes the transpose of a matrix or vector. Note that the co- 
variance matrix is a symmetric matrix since COV(X;, X;) = COV(X;, Xi). 

Equation (6.42a) shows that the pdf of jointly Gaussian random variables is com- 
pletely specified by the individual means and variances and the pairwise covariances. It 
can be shown using the joint characteristic function that all the marginal pdf’s associat- 
ed with Eq. (6.42a) are also Gaussian and that these too are completely specified by 
the same set of means, variances, and covariances. 


Example 6.20 


Verify that the two-dimensional Gaussian pdf given in Eq. (5.61a) has the form of Eq. (6.42a). 
The covariance matrix for the two-dimensional case is given by 


2 
= O71 Px,yO192 
K = 2 j 
Px yT102 o2 


where we have used the fact the COV(X), X2) = py yo io7. The determinant of K is o? 
a7(1 — p’xy) so the denominator of the pdf has the correct form. The inverse of the covariance 
matrix is also a real symmetric matrix: 


K” = ae ee 03 Px yO192 
22a Ae _ 2 $ 
a 07x( Py) Px y7102 Ti 
The term in the exponent is therefore 
1 ( l o3 Te = s 
x = m,y- m 
ojo3x(1 = Py) ~PXyvO192 ot y = m 
1 ER E l o3(x — m1) — pxyo1o2(y — m) l 
= > 2 
ojo3(1 — pxy) —pxyo1ox(x — m) + o7(y — m) 


((x — my)lo1)? — 2pxy((x — m)lor)((y — myo) + ((y = m)/o2)? 
(De pxy) 
Thus the two-dimensional pdf has the form of Eq. (6.42a). 
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Example 6.21 


The vector of random variables (X, Y, Z) is jointly Gaussian with zero means and covariance matrix: 


VAR(X) COV(X,Y) COV(X,Z) 10 02 03 
K =| COVW(Y,X) VAR(Y) Cowy,Z)|=|02 10 04 |. 
COV(Z,X) COV(Z,Y)  VAR(Z) 03 04 1.0 


Find the marginal pdf of X and Z. 

We can solve this problem two ways. The first involves integrating the pdf directly to obtain 
the marginal pdf. The second involves using the fact that the marginal pdf for X and Z is also Gauss- 
ian and has the same set of means, variances, and covariances. We will use the second approach. 

The pair (X, Z) has zero-mean vector and covariance matrix: 


Ki = VAR(X) COV(X,Z) |_| 1.0 03 
COV(Z,X)  VAR(Z) 0.3 1.0] 
The joint pdf of X and Z is found by substituting a zero-mean vector and this covariance matrix 
into Eq. (6.42a). 


Example 6.22 Independence of Uncorrelated Jointly Gaussian Random Variables 


Suppose X1, X2,..., X, are jointly Gaussian random variables with COV(X;, X;) = Ofori + j. 
Show that X1, X>,..., X, are independent random variables. 
From Eq. (6.42b) we see that the covariance matrix is a diagonal matrix: 


K = diag[ VAR(X;)] = diag[o?] 


1 
K” = ias! $ | 
Ti 


Therefore 


and 


exp{—} a [(x; m;)lo;f} z n exp{—} [(x; m;)lo;P} ` 
x)= K| = = i 
x(x) Ony Iki = If = rx) 
Thus X1, X2,..., X, are independent Gaussian random variables. 


Example 6.23 Conditional pdf of Gaussian Random Variable 
Find the conditional pdf of X,, given X,, X2,..., X, 


n-1- 
Let K, be the covariance matrix for X,, = (X1, X2,..., Xn) and K,,_; be the covariance ma- 


trix for X,_) = (X1, X>,..., X,-1). Let Q,, = Kj! and Q,., = K,1,, then the latter matrices are 
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submatrices of the former matrices as shown below: 


Kin An 

K,,_ K E 
K, = n-1 f 2n Q, = Q, 1 Qon 
| Kin Koy ee Knn LQin Qan veS Qnn 


Below we will use the subscript n or n — 1 to distinguish between the two random vectors and 
their parameters. The marginal pdf of X„ given X1, X2,..., Xn—1 is given by: 


Fx, (Xn) 
Xn l Xis., Xn) = = 
fxn 1 1) fx,_,(Xn—1) 
exp{—3(x, = m,)'Q,(Xp T m,)} (2m) DK, al 
(2r)? |K, exp{-}(Xn-1 = m,-1) Qu—-1(Xn—-1 — m,-1)} 


exp{—3(x, = m,)'Q,(X; m, ) + 3(Xp—1 _ m,-1) Qn-1(X7-1 E m,„-1)} 


V2rl|K,| K, 


In Problem 6.60 we show that the terms in the above expression are given by: 
3(Xn a m,)"Q,(Xn = m,) 7 ni i m, -1) Qn—1(Xn-1 E m,-1) 


= Onn{ (Xn n Mn) + By? = QmnB? (6.43) 


1 n-1 
where B = 0, enti — mj) and 1K,,|/|K,,—11| = 1/Qnn- 


nn j= 


This implies that X,, has mean m, — B, and variance 1/Q,,,,. The term Q,,,B’ is part of the nor- 
malization constant. We therefore conclude that: 


a 1 n-1 2 
apf e (« M, + DY Qin(x; - m)) 


nnj=1 
fx, (Xn | X05 Xn-1) = 
V 27 / Onn 


We see that the conditional mean of X, is a linear function of the “observations” 
X1,X25+++5Xy-1- 


*6.4.1 Linear Transformation of Gaussian Random Variables 


A very important property of jointly Gaussian random variables is that the linear trans- 
formation of any n jointly Gaussian random variables results in n random variables that 
are also jointly Gaussian. This is easy to show using the matrix notation in Eq. (6.42a). 
Let X = (Xj,..., Xn) be jointly Gaussian with covariance matrix K y and mean vector 
my and define Y = (Y,,..., Y,,) by 

Y = AX, 
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where A is an invertible n X n matrix. From Eq. (5.60) we know that the pdf of Y is 
given by 


E fx(A'y) 
f(y) ~ |A| 
_ exp{=2(4'y — ane) ZW (6.44) 
(27) |AllKxl 


From elementary properties of matrices we have that 
(Ay — mx) = A'(y — Amy) 
and 
(Aly — mx)" = (y — Amy)" A". 
The argument in the exponential is therefore equal to 
(y — Amx)" A" K¥A (y — Amx) = (y — Amx)"(AKyA")'(y — Amx) 


since ATK = (AKyA') 1. Letting Ky = AKyA' and my = Amy and noting that 
det(Ky) = det( AK yA") = det(A)det(K y)det( AT) = det(A)? det(K y), we finally 
have that the pdf of Y is 

e7 (1/2) (y—my) "Ky" (y-my) 


= /2 12 . 6.45 
fx) (my"1KyI (6.45) 


Thus the pdf of Y has the form of Eq. (6.42a) and therefore Y,,..., Y, are jointly 
Gaussian random variables with mean vector and covariance matrix: 


my = Amy and Ky = AKyA!. 


This result is consistent with the mean vector and covariance matrix we obtained be- 
fore in Eqs. (6.30a) and (6.30b). 

In many problems we wish to transform X to a vector Y of independent Gaussian 
random variables. Since K y is a symmetric matrix, it is always possible to find a matrix 
A such that AKy A! = A is a diagonal matrix. (See Section 6.6.) For such a matrix A, 
the pdf of Y will be 


e-(112)(y-n)"A (y-n) 


fy(y) z 2r)” Al 


1/2 


= 12> (6.46) 


where A;,..., A, are the diagonal components of A. We assume that these values are 
all nonzero. The above pdf implies that Y,,..., Y,, are independent random variables 
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with means n; and variance A,. In conclusion, it is possible to linearly transform a vector 
of jointly Gaussian random variables into a vector of independent Gaussian random 
variables. 

It is always possible to select the matrix A that diagonalizes K so that det(A) = 1. 
The transformation AX then corresponds to a rotation of the coordinate system so that 
the principal axes of the ellipsoid corresponding to the pdf are aligned to the axes of the 
system. Example 5.48 provides an n = 2 example of rotation. 

In computer simulation models we frequently need to generate jointly Gaussian 
random vectors with specified covariance matrix and mean vector. Suppose that 
X = (Xi, X2,...,X,) has components that are zero-mean, unit-variance Gaussian 
random variables, so its mean vector is 0 and its covariance matrix is the identity matrix 
I. Let K denote the desired covariance matrix. Using the methods discussed in Section 
6.3, it is possible to find a matrix A so that ATA = K. Therefore Y = ATU has zero 
mean vector and covariance K. From Eq. (6.46) we have that Y is also a jointly Gauss- 
ian random vector with zero mean vector and covariance K. If we require a nonzero 
mean vector m, we use Y + m. 


Example 6.24 Sum of Jointly Gaussian Random Variables 
Let X1, X5,..., X,, be jointly Gaussian random variables with joint pdf given by Eq. (6.42a). Let 
Z = a,X1 + aX + +++ + a,X). 


We will show that Z is always a Gaussian random variable. 
We find the pdf of Z by introducing auxiliary random variables. Let 


Z = X, Z3 = X3,..-5 Zn = Xn- 
If we define Z = (Z1, Z,..., Z,), then 
Z = AX 
where 
[a az an 
0 1 0 
A= 
|0 0 1 


From Eq. (6.45) we have that Z is jointly Gaussian with mean n = Am, and covariance matrix 
C = AK A". Furthermore, it then follows that the marginal pdf of Z is a Gaussian pdf with mean 
given by the first component of n and variance given by the 1-1 component of the covariance ma- 
trix C. By carrying out the above matrix multiplications, we find that 


(6.47a) 


(6.47b) 
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*6.4.2 Joint Characteristic Function of a Gaussian Random Variable 


The joint characteristic function is very useful in developing the properties of jointly 
Gaussian random variables. We now show that the joint characteristic function of n 
jointly Gaussian random variables Xj, X2, . . ., X, is given by 


On ln n 
Ëx, x, Ses x,(@1 5 OIE N wn) = ef dj Din a1 Pik COV(X;, Xx) (6.48a) 


> 


which can be written more compactly as follows: 
à 1 
Px(@) a Py, x,, os x,(@1 pO Cs FR Wn) = ciem- Ko (6.48b) 


where m is the vector of means and K is the covariance matrix defined in Eq. (6.42b). 

Equation (6.48) can be verified by direct integration (see Problem 6.65). We use 
the approach in [Papoulis] to develop Eq. (6.48) by using the result from Example 6.24 
that a linear combination of jointly Gaussian random variables is always Gaussian. 
Consider the sum 


Z= aıXı + a, X> see se a,Xp- 
The characteristic function of Z is given by 
D,(w) = Efel#2] = El elloaXitomXot:--+ednXn)] 
peeey x, (a0, aQW,..., anw). 


On the other hand, since Z is a Gaussian random variable with mean and variance 
given Eq. (6.47), we have 


;(w) = egivE[Z]—7 VAR[Z]o? 


_ eio X; amo >", Alte COV( XX4), (6.49) 
By equating both expressions for ®z(w) with w = 1, we finally obtain 
Dy x.. x,a, a25.. -, 4p) = ef dian ami- Dy- Xp- COV(X Xa) 
= pia'm—ja'Ka, (6.50) 


By replacing the a;’s with w;’s we obtain Eq. (6.48). 

The marginal characteristic function of any subset of the random variables 
X,, X2,..., Xn can be obtained by setting appropriate w,’s to zero. Thus, for example, 
the marginal characteristic function of X1, X2,..., Xm for m < n is obtained by set- 
ting @m+1 = On+2 = tt = wp = 0. Note that the resulting characteristic function 
again corresponds to that of jointly Gaussian random variables with mean and covari- 
ance terms corresponding the reduced set X1, X2,..., Xm- 

The derivation leading to Eq. (6.50) suggests an alternative definition for jointly 
Gaussian random vectors: 

Definition: X is a jointly Gaussian random vector if and only every linear com- 
bination Z = a'X is a Gaussian random variable. 
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In Example 6.24 we showed that if X is a jointly Gaussian random vector then the lin- 
ear combination Z = aX is a Gaussian random variable. Suppose that we do not 
know the joint pdf of X but we are given that Z = a'X is a Gaussian random variable 
for any choice of coefficients a’ = (a1, a2,..., an). This implies that Eqs. (6.48) and 
(6.49) hold, which together imply Eq. (6.50) which states that X has the characteristic 
function of a jointly Gaussian random vector. 

The above definition is slightly broader than the definition using the pdf in Eq. (6.44). 
The definition based on the pdf requires that the covariance in the exponent be invertible. 
The above definition leads to the characteristic function of Eq. (6.50) which does not 
require that the covariance be invertible. Thus the above definition allows for cases 
where the covariance matrix is not invertible. 


ESTIMATION OF RANDOM VARIABLES 


In this book we will encounter two basic types of estimation problems. In the first type, we 
are interested in estimating the parameters of one or more random variables, e.g., probabil- 
ities, means, variances, or covariances. In Chapter 1, we stated that relative frequencies can 
be used to estimate the probabilities of events, and that sample averages can be used to es- 
timate the mean and other moments of a random variable. In Chapters 7 and 8 we will 
consider this type of estimation further. In this section, we are concerned with the second 
type of estimation problem, where we are interested in estimating the value of an inacces- 
sible random variable X in terms of the observation of an accessible random variable Y. For 
example, X could be the input to a communication channel and Y could be the observed 
output. In a prediction application, X could be a future value of some quantity and Y its 
present value. 


MAP and ML Estimators 


We have considered estimation problems informally earlier in the book. For example, 
in estimating the output of a discrete communications channel we are interested in 
finding the most probable input given the observation Y = y, that is, the value of input 
x that maximizes P[X = x|Y = y]: 


max P[X = x|Y = y]. 
x 


In general we refer to the above estimator for X in terms of Y as the maximum a pos- 
teriori (MAP) estimator. The a posteriori probability is given by: 


PLY = y| X = x]P[X = x] 
P[Y = y] 


PIX =x|Y=y]= 


and so the MAP estimator requires that we know the a priori probabilities P[ X = x]. 
In some situations we know P[Y = y| X = x] but we do not know the a priori proba- 
bilities, so we select the estimator value x as the value that maximizes the likelihood of 
the observed value Y = y: 


max P[Y = y|X = x]. 
x 
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We refer to this estimator of X in terms of Y as the maximum likelihood (ML) estimator. 

We can define MAP and ML estimators when_X and Y are continuous random 
variables by replacing events of the form {Y = y} by {y < Y < y + dy}. If X and Y 
are continuous, the MAP estimator for X given the observation Y is given by: 


maxfy(X = x|Y = y), 
x 
and the ML estimator for X given the observation Y is given by: 


maxfy(Y = y|X = x). 


Example 6.25 Comparison of ML and MAP Estimators 


Let X and Y be the random pair in Example 5.16. Find the MAP and ML estimators for X in 
terms of Y. 
From Example 5.32, the conditional pdf of X given Y is given by: 


fe(xly) =e for y<x 


which decreases as x increases beyond y. Therefore the MAP estimator is Ruine = y. On the 
other hand, the conditional pdf of Y given X is: 


rol) = 7 = for 0<y=x. 
— e 


As x increases beyond y, the denominator becomes larger so the conditional pdf decreases. There- 
fore the ML estimator is Xm = y. In this example the ML and MAP estimators agree. 


Example 6.26 Jointly Gaussian Random Variables 


Find the MAP and ML estimator of X in terms of Y when X and Y are jointly Gaussian random 
variables. 
The conditional pdf of X given Y is given by: 


1 Ox 7 
exp 21 = pox Xx —p oy (y my) mx 
V2mox (1 — p°) 


which is maximized by the value of x for which the exponent is zero. Therefore 


fx(xly) = 


J Ox 
Xma = Po (y — my) + my. 


The conditional pdf of Y given X is: 


1 Oy 2 
exp 2(1 an pog y Poy (x my) My 
V2mrey (1 — p°) l 


which is also maximized for the value of x for which the exponent is zero: 


fy(ylx) = 


Oy 
0 = y — p—(x — my) — my. 
ox 
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The ML estimator for X given Y = y is then: 


ox 
poy 


XML (y — my) + my. 
Therefore we conclude that Xut # Xs ap- In other words, knowledge of the a priori probabili- 
ties of X will affect the estimator. 


Minimum MSE Linear Estimator 


The estimate for X is given by a function of the observation X= g(Y). In general, the 
estimation error, X — X=X- g(Y), is nonzero, and there is a cost associated with 
the error, c(X — g(Y)). We are usually interested in finding the function g(Y) that 
minimizes the expected value of the cost, E[c(X — g(Y))]. For example, if X and Y 
are the discrete input and output of a communication channel, and c is zero when 
X = g(Y) and one otherwise, then the expected value of the cost corresponds to the 
probability of error, that is, that X # g(Y). When X and Y are continuous random 
variables, we frequently use the mean square error (MSE) as the cost: 


e= E[(X =YV]. 


In the remainder of this section we focus on this particular cost function. We first con- 
sider the case where g(Y) is constrained to be a linear function of Y, and then consider 
the case where g(Y) can be any function, whether linear or nonlinear. 

First, consider the problem of estimating a random variable X by a constant a so 
that the mean square error is minimized: 


min E[(X — a)?] = EL X?] — 2aE[X] + a’. (6.51) 
a 
The best a is found by taking the derivative with respect to a, setting the result to zero, 


and solving for a. The result is 
a* = E| X], (6.52) 


which makes sense since the expected value of X is the center of mass of the pdf. The 
mean square error for this estimator is equal to E[(X — a*)*] = VAR(X). 
Now consider estimating X by a linear function g(Y) = aY + b: 


min E[(X - aY — b)?}. (6.53a) 


Equation (6.53a) can be viewed as the approximation of X — aY by the constant b. 
This is the minimization posed in Eq. (6.51) and the best b is 
b* = E| X — aY] = E[X] - aE Y]. (6.53b) 
Substitution into Eq. (6.53a) implies that the best a is found by 
min E[{(X — E[X]) - a(Y - E[Y])}"]. 


We once again differentiate with respect to a, set the result to zero, and solve for a: 


0 = Z E(x - E[X]) - a(¥ - E[Y])*] 
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= -2E[{(X - E[X]) =e = E[Y])} (Y - E[Y])] 
= —2(COV(X, Y) — aVAR(Y)). (6.54) 
The best coefficient a is found to be 


__ COVXY) ox 
O VARY) Oa? 


where oy = VVAR(Y) and oy = VVAR(X). Therefore, the minimum mean 
square error (mmse) linear estimator for X in terms of Y is 


X =a*Y + b* 


Y - E[Y] 
= py yo x ——— +t E[X]. (6.55) 
Oy 
The term (Y — E[Y])/oy is simply a zero-mean, unit-variance version of Y. Thus 
ax(Y — E[Y])/oy isa rescaled version of Y that has the variance of the random variable 
that is being estimated, namely o¢. The term ELX] simply ensures that the estimator has 
the correct mean. The key term in the above estimator is the correlation coefficient: 
px y Specifies the sign and extent of the estimate of Y relative to oy(Y — E[Y ])/oy.IfX 
and Y are uncorrelated (i.e., py y = 0) then the best estimate for X is its mean, ELX]. 
On the other hand, if py y = +1 then the best estimate is equal to tay(Y — E[|Y])/ 
oy + E[X}. 
We draw our attention to the second equality in Eq. (6.54): 


E(X — E[X]) — a*(Y — E[Y])}(Y — E[Y])] = 0. (6.56) 


This equation is called the orthogonality condition because it states that the error of 
the best linear estimator, the quantity inside the braces, is orthogonal to the observa- 
tion Y — E[Y]. The orthogonality condition is a fundamental result in mean square 
estimation. 

The mean square error of the best linear estimator is 


ef, = E[((X — ELX]) - a*(Y - ELY)))J 
= E[((X - E[X]) - a*(Y - E[Y]))(X - ELX))] 
- a* E[((X - E[X]) - (Y - E[Y]))(¥ - ELY))] 
= E[((X - E[X]) - a*(Y - E[Y]))(X - ELX))] 
= VAR(X) — a* COV(X,Y) 
= VAR(X)(1 - pkey) (6.57) 


where the second equality follows from the orthogonality condition. Note that when 
lox y| = 1, the mean square error is zero. This implies that P[|X — a*Y — b*| = 0] 
= P| X = a*Y + b*] = 1, so that X is essentially a linear function of Y. 
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Minimum MSE Estimator 


In general the estimator for X that minimizes the mean square error is a nonlinear 
function of Y. The estimator g(Y) that best approximates X in the sense of minimizing 
mean square error must satisfy 
me E[(X — 2(Y))’]. 
8. 
The problem can be solved by using conditional expectation: 


E[(X — g(¥))"] = ELEL(X - 8(Y))* IY] 


= f EUX - 80)PIY = vif 


The integrand above is positive for all y; therefore, the integral is minimized by mini- 
mizing E[(X — g(Y))?|Y = y] for each y. But g(y) is a constant as far as the condi- 
tional expectation is concerned, so the problem is equivalent to Eq. (6.51) and the 
“constant” that minimizes E| (X — g(y))?|¥ = y] is 


g*(y) = ELX|Y = y]. (6.58) 


The function g*(y) = E[X |Y = y] is called the regression curve which simply traces 
the conditional expected value of X given the observation Y = y. 
The mean square error of the best estimator is: 


et = E[(X - g*(Y))}] = 1 HE- HXW- 


[XARIX IY = foray, 
R" 
Linear estimators in general are suboptimal and have larger mean square errors. 


Example 6.27 Comparison of Linear and Minimum MSE Estimators 


Let X and Y be the random pair in Example 5.16. Find the best linear and nonlinear estimators 
for X in terms of Y, and of Y in terms of X. 

Example 5.28 provides the parameters needed for the linear estimator: E[ X] = 3/2, 
E[Y] = 1/2, VAR[X] = 5/4, VAR[Y] = 1/4, and pyy = 1V5. Example 5.32 provides the 
conditional pdf’s needed to find the nonlinear estimator. The best linear and nonlinear estima- 
tors for X in terms of Y are: 


a 1 SY = T2. 3 
X vs í | YL 


y5 2 m2 2 


E[Xly] = f xe) dx = y + 1 andso E[X|Y] =Y +1. 
y 


Thus the optimum linear and nonlinear estimators are the same. 
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Estimator for Y given x 
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S 


FIGURE 6.2 
Comparison of linear and nonlinear estimators. 


The best linear and nonlinear estimators for Y in terms of X are: 


pa eee ae) 


v52 v52 2 


ELY |x] o e” F 1 — e™* — xe” E 
ž o le y 1 -— e™ 1-e* 


The optimum linear and nonlinear estimators are not the same in this case. Figure 6.2 compares 
the two estimators. It can be seen that the linear estimator is close to E[Y | x] for lower values of 


x, where the joint pdf of X and Y are concentrated and that it diverges from E[Y | x] for larger 
values of x. 


Example 6.28 


Let X be uniformly distributed in the interval (—1, 1) and let Y = X?. Find the best linear esti- 
mator for Y in terms of X. Compare its performance to the best estimator. 
The mean of X is zero, and its correlation with Y is 
1 
E[ XY] = E[XX?] = f x3/2 dx = 0. 
1 


2 
Therefore COV(X, Y) = 0 and the best linear estimator for Y is E[Y] by Eq. (6.55). The mean 


square error of this estimator is the VAR(Y) by Eq. (6.57). 
The best estimator is given by Eq. (6.58): 


E[Y |X = x] = E[X?|X = x] = x’. 
The mean square error of this estimator is 


EL(Y = 8(X)}] = E(X? = #°Y] = 0. 


Thus in this problem, the best linear estimator performs poorly while the nonlinear estimator 
gives the smallest possible mean square error, zero. 
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Example 6.29 Jointly Gaussian Random Variables 


Find the minimum mean square error estimator of X in terms of Y when X and Y are jointly 
Gaussian random variables. 

The minimum mean square error estimator is given by the conditional expectation of X 
given Y. From Eq. (5.63), we see that the conditional expectation of X given Y = y is given by 


E[X|¥ = y] = E[X] + px,y g> (Y - ELY)). 


This is identical to the best linear estimator. Thus for jointly Gaussian random variables the min- 
imum mean square error estimator is linear. 


Estimation Using a Vector of Observations 


The MAP, ML, and mean square estimators can be extended to where a vector of ob- 
servations is available. Here we focus on mean square estimation. We wish to estimate 
X by a function g(Y) of a random vector of observations Y = (Y;, Y2,..., Y,,)' so that 
the mean square error is minimized: 
pe E[(X — g(¥Y))’}. 
gl. 

To simplify the discussion we will assume that X and the Y; have zero means. The 
same derivation that led to Eq. (6.58) leads to the optimum minimum mean square 
estimator: 


a*(y) = E[X|¥ = y). (6.59) 


The minimum mean square error is then: 


E[(X — g*(¥))’] [aux — E[X|Y)’ IY = ylfy(y)dy 


[ VARIXY = yiptyiey. 
IR" 
Now suppose the estimate is a linear function of the observations: 
n 
g(Y) = X aY, = aly. 
k=1 
The mean square error is now: 
n 2 
EH ey) (= 2) |x Pe 


We take derivatives with respect to a, and again obtain the orthogonality conditions: 


n 
a(x — Sa )y =0 forj=1,...,n. 
k=1 
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The orthogonality condition becomes: 


n n 
E[XY)] = a| (San )y = Sa,E[Y,¥)] forj =1,...,n. 
k=1 k=1 


We obtain a compact expression by introducing matrix notation: 
E| XY] = Rya where a = (a1, @2,..., an)”. (6.60) 


where E[XY] = [E[ XY], E[XY)],..., E[XY,]" and Ry is the correlation matrix. 
Assuming Ry is invertible, the optimum coefficients are: 


a = Ry 'E[ XY]. (6.61a) 


We can use the methods from Section 6.3 to invert Ry. The mean square error of the 
optimum linear estimator is: 


E[(X — a'Y)*] = E[(X - a'Y)X] — E[(X - a'Y)a'Y] 
= E[(X - a'Y)X] = VAR(X) — a'E[YX]. (6.61b) 


Now suppose that X has mean m y and Y has mean vector my, so our estimator 
now has the form: 


n 
X = ¢(¥) = Dia¥, + b = atY +b. (6.62) 
k=1 
The same argument that led to Eq. (6.53b) implies that the optimum choice for b is: 
b = E[X] — a'my. 
Therefore the optimum linear estimator has the form: 


X = ¢(Y) = a™(Y — my) + my = aZ + my 


where Z = Y — my is arandom vector with zero mean vector. The mean square error 
for this estimator is: 


E{(X — g(¥))?] = E[(X - aZ — my)*] = E[(W - a®Z)’] 


where W = X — my has zero mean. We have reduced the general estimation prob- 
lem to one with zero mean random variables, i.e., W and Z, which has solution given 
by Eq. (6.61a). Therefore the optimum set of linear predictors is given by: 


a = R; 'E[WZ] = Ky'E[(X — my)(Y¥ — my)]. (6.63a) 
The mean square error is: 
E[(X - a'Y — b)’] = E[(W — a'ZW] = VAR(W) — a E[WZ] 
= VAR(X) — alE[(X — mx)(Y — my)].  (6.63b) 


This result is of particular importance in the case where X and Y are jointly Gauss- 
ian random variables. In Example 6.23 we saw that the conditional expected value 
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of X given Y is a linear function of Y of the form in Eq. (6.62). Therefore in this case 
the optimum minimum mean square estimator corresponds to the optimum linear 
estimator. 


Example 6.30 Diversity Receiver 


A radio receiver has two antennas to receive noisy versions of a signal X. The desired signal X is 
a Gaussian random variable with zero mean and variance 2. The signals received in the first 
and second antennas are Y, = X + N, and Y, = X + N, where N, and N, are zero-mean, 
unit-variance Gaussian random variables. In addition, X, N,, and N are independent random 
variables. Find the optimum mean square error linear estimator for X based on a single antenna 
signal and the corresponding mean square error. Compare the results to the optimum mean 
square estimator for X based on both antenna signals Y = (Yj, Y>). 

Since all random variables have zero mean, we only need the correlation matrix and the 
cross-correlation vector in Eq. (6.61): 


R -| ZIYI EMY] 
Y LEMY]  ELY3] 


_| ENX +N E[(X + M)(X + w 
E[(X + N)(X + N)] E[(X + N,)’] 


_ | ELX?] + E[N?) ELX? f3 2 
oL ELX?] E[X?] + E[N} | |2 3 
_ | E[XY] | _| EX3 | _ | 2 

ee e : EJ 7 H 


The optimum estimator using a single antenna received signal involves solving the 1 X 1 version 
of the above system: 


and 


Ae E(X?] 2 
a EL X?] + EIN]! 3 


Yı 


and the associated mean square error is: 


i 2, 2 
VAR(X) = a* COV(Y,, X) = 2 - 32 =. 


The coefficients of the optimum estimator using two antenna signals are: 


seven -[ TPE, A- 


and the optimum estimator is: 


A 


X = 04Y, + 0.4Y). 


The mean square error for the two antenna estimator is: 


E[(X — a'Y)’] = VAR(X) - aTE[YX] = 2 — [0.4, oaz] = 0.4. 
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As expected, the two antenna system has a smaller mean square error. Note that the re- 
ceiver adds the two received signals and scales the result by 0.4. The sum of the signals is: 


i N, + Ny 
X = OAY, + 04Y, = 0.4(2X + N, + Ny) = 0.8{ X + — 7 


so combining the signals keeps the desired signal portion, X, constant while averaging the two 
noise signals N; and N>. The problems at the end of the chapter explore this topic further. 


Example 6.31 Second-Order Prediction of Speech 


Let X1, X2,... be a sequence of samples of a speech voltage waveform, and suppose that the 
samples are fed into the second-order predictor shown in Fig. 6.3. Find the set of predictor coef- 
ficients a and b that minimize the mean square value of the predictor error when X, is estimat- 
ed by aX,,_. + bX,_1. 

We find the best predictor for X,, X}, and X; and assume that the situation is identical for 
X2, X3, and X; and so on. It is common practice to model speech samples as having zero mean 
and variance o”, and a covariance that does not depend on the specific index of the samples, but 
rather on the separation between them: 


COV(X;, Xx) = pyj-«or. 


The equation for the optimum linear predictor coefficients becomes 


An eL] 


Equation (6.61a) gives 


2 
Po — pi pi(l — pi) 
a= z and b= an 
1— pj 1—p 
Xn — m] Api Xn -2 
b — a — 


FIGURE 6.3 
A two-tap linear predictor for processing 
speech. 
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In Problem 6.78, you are asked to show that the mean square error using the above values of a 


and b is 
(pt — p2)° 
ft pt TES f (6.64) 
PI 


Typical values for speech signals are p4 = .825 and p, = .562.The mean square value of the pre- 
dictor output is then .28107. The lower variance of the output (.28107) relative to the input vari- 
ance (a) shows that the linear predictor is effective in anticipating the next sample in terms of 
the two previous samples. The order of the predictor can be increased by using more terms in the 
linear predictor. Thus a third-order predictor has three terms and involves inverting a3 X 3 cor- 
relation matrix, and an n-th order predictor will involve ann X n matrix. Linear predictive tech- 
niques are used extensively in speech, audio, image and video compression systems. We discuss 
linear prediction methods in greater detail in Chapter 10. 


GENERATING CORRELATED VECTOR RANDOM VARIABLES 


Many applications involve vectors or sequences of correlated random variables. Com- 
puter simulation models of such applications therefore require methods for generating 
such random variables. In this section we present methods for generating vectors of 
random variables with specified covariance matrices. We also discuss the generation of 
jointly Gaussian vector random variables. 


Generating Random Vectors with Specified Covariance Matrix 


Suppose we wish to generate a random vector Y with an arbitrary valid covariance ma- 
trix Ky. Let Y = ATX as in Example 6.17, where X is a vector random variable with 
components that are uncorrelated, zero mean, and unit variance. X has covariance ma- 
trix equal to the identity matrix Kx = I, my = Amy = 0, and 


Ky = A'KxA = ATA. 


Let P be the matrix whose columns are the eigenvectors of Ky and let A be the diago- 
nal matrix of eigenvalues, then from Eq. (6.39b) we have: 
P'KyP = P'PA = A. 


If we premultiply the above equation by P and then postmultiply by PT, we obtain ex- 
pression for an arbitrary covariance matrix Ky in terms of its eigenvalues and eigen- 
vectors: 


PAP! = PP'KyPP' = Ky. (6.65) 
Define the matrix A’? as the diagonal matrix of square roots of the eigenvalues: 


Vài me. WE 
0 V® aw -0 


A12 A 
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In Problem 6.53 we show that any covariance matrix Ky is positive semi-definite, 
which implies that it has nonnegative eigenvalues, and so taking the square root is al- 
ways possible. If we now let 
A = (PAAT (6.66) 
then 
ATA = PA'7A'?PT = PAP" = Ky. 


Therefore Y has the desired covariance matrix Ky. 


Example 6.32 


Let X = (X,, X2) consist of two zero-mean, unit-variance, uncorrelated random variables. Find 
the matrix A such that Y = AX has covariance matrix 


4 2 
K = : 
2 4 
First we need to find the eigenvalues of K which are determined from the following equation: 


2 
4-2 


4-A 
det(K = AD) = 0= de] i J-a A? -4=A- 8A +12 


= (A— 6)(A — 2). 


We find the eigenvalues to be A; = 2 and à, = 6. Next we need to find the eigenvectors corre- 
sponding to each eigenvalue: 


2 4 e e2 e2 
which implies that 2e} + 2e, = 0. Thus any vector of the form [1, —1]" is an eigenvector. We 


choose the normalized eigenvector corresponding to A; = 2 as e; = [1/V2, -1/ V27". We 


similarly find the eigenvector corresponding to A = 6 as e = [1V2, V22F. 
The method developed in Section 6.3 requires that we form the matrix P whose columns 
consist of the eigenvectors of K: 


Next it requires that we form the diagonal matrix with elements equal to the square root of the 


eigenvalues: 
an |VY2 0 
0 V6} 


= 1⁄2 _ 
A=PA i val 


You should verify that K = AAF. 


The desired matrix is then 
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Example 6.33 


Use Octave to find the eigenvalues and eigenvectors calculated in the previous example. 

After entering the matrix K, we use the eig(K) function to find the matrix of eigenvectors 
P and eigenvalues A. We then find A and its transpose A‘. Finally we confirm that ATA gives the 
desired covariance matrix. 


>K=[4, 2; 2, 4]; 
> [P,D] =eig (K) 
P= 
—0.70711 0.70711 

0.70711 0.70711 


—1.0000 1.0000 
1.7321 1.7321 
>A’ 
ans = 
—1.0000 1.7321 
1.0000 1.7321 
>A'*A 
ans = 
4.0000 2.0000 
2.0000 4.0000 


The above steps can be used to find the transformation A" for any desired covariance 
matrix K. The only check required is to ascertain that K is a valid covariance matrix: 
(1) K is symmetric (trivial); (2) K has positive eigenvalues (easy to check numerically). 


6.6.2 Generating Vectors of Jointly Gaussian Random Variables 


In Section 6.4 we found that if X is a vector of jointly Gaussian random variables with 
covariance Ky, then Y = AX is also jointly Gaussian with covariance matrix 
Ky = AKyAl'. If we assume that X consists of unit-variance, uncorrelated random 
variables, then Ky = J, the identity matrix, and therefore Ky = AAT. 

We can use the method from the first part of this section to find A for any desired 
covariance matrix Ky. We generate jointly Gaussian random vectors Y with arbitrary 
covariance matrix Ky and mean vector my as follows: 


1. Find a matrix A such that Ky = AA’. 


2. Use the method from Section 5.10 to generate X consisting of n independent, 
zero-mean, Gaussian random variables. 


3. Let Y = AX + my. 
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Example 6.34 


The Octave commands below show necessary steps for generating the Gaussian random vari- 
ables with the covariance matrix from Example 6.30. 


>Ul=rand(1000, 1); % Create a 1000-element vector U4. 
>U2=rand(1000, 1); % Create a 1000-element vector Up. 
> R2=—2"log (U1) ; Find R’. 

> TH=2“*pi*U2; % Find ©. 

> Xl=sqrt (R2) ."sin(TH) ; % Generate X1. 

> X2=sqrt (R2) . “cos (TH); % Generate X2. 

> Y1=X1+sqrt (3) *X2 % Generate Y1. 

> Y2=—X1+sqrt (3) *X2 % Generate Y2. 

>plot (Y1,Y2,‘+") % Plot scattergram. 


We plotted the Y, values vs. the Y, values for 1000 pairs of generated random variables in 
a scattergram as shown in Fig. 6.4. Good agreement with the elliptical symmetry of the desired 
jointly Gaussian pdf is observed. 


FIGURE 6.4 
Scattergram of jointly Gaussian random variables. 
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SUMMARY 


The joint statistical behavior of a vector of random variables X is specified by 
the joint cumulative distribution function, the joint probability mass function, 
or the joint probability density function. The probability of any event involv- 
ing the joint behavior of these random variables can be computed from these 
functions. 


The statistical behavior of subsets of random variables from a vector X is speci- 
fied by the marginal cdf, marginal pdf, or marginal pmf that can be obtained from 
the joint cdf, joint pdf, or joint pmf of X. 

A set of random variables is independent if the probability of a product-form 
event is equal to the product of the probabilities of the component events. Equiv- 
alent conditions for the independence of a set of random variables are that the 
joint cdf, joint pdf, or joint pmf factors into the product of the corresponding mar- 
ginal functions. 


The statistical behavior of a subset of random variables from a vector X, given 
the exact values of the other random variables in the vector, is specified by the 
conditional cdf, conditional pmf, or conditional pdf. Many problems naturally 
lend themselves to a solution that involves conditioning on the values of some of 
the random variables. In these problems, the expected value of random variables 
can be obtained through the use of conditional expectation. 


The mean vector and the covariance matrix provide summary information about 
a vector random variable. The joint characteristic function contains all of the in- 
formation provided by the joint pdf. 


Transformations of vector random variables generate other vector random vari- 
ables. Standard methods are available for finding the joint distributions of the 
new random vectors. 


The orthogonality condition provides a set of linear equations for finding the 
minimum mean square linear estimate. The best mean square estimator is given 
by the conditional expected value. 


The joint pdf of a vector X of jointly Gaussian random variables is determined by 
the vector of the means and by the covariance matrix. All marginal pdf’s and con- 
ditional pdf’s of subsets of X have Gaussian pdf’s. Any linear function or linear 
transformation of jointly Gaussian random variables will result in a set of jointly 
Gaussian random variables. 


A vector of random variables with an arbitrary covariance matrix can be gener- 
ated by taking a linear transformation of a vector of unit-variance, uncorrelated 
random variables. A vector of Gaussian random variables with an arbitrary co- 
variance matrix can be generated by taking a linear transformation of a vector of 
independent, unit-variance jointly Gaussian random variables. 


CHECKLIST OF IMPORTANT TERMS 


Conditional cdf 

Conditional expectation 
Conditional pdf 

Conditional pmf 

Correlation matrix 

Covariance matrix 

Independent random variables 
Jacobian of a transformation 

Joint cdf 

Joint characteristic function 

Joint pdf 

Joint pmf 

Jointly continuous random variables 
Jointly Gaussian random variables 


ANNOTATED REFERENCES 


Annotated References 


Karhunen-Loeve expansion 
MAP estimator 

Marginal cdf 

Marginal pdf 

Marginal pmf 

Maximum likelihood estimator 
Mean square error 

Mean vector 

MMSE linear estimator 
Orthogonality condition 
Product-form event 
Regression curve 

Vector random variables 
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PROBLEMS 


Vector Random Variables 


Section 6.1: Vector Random Variables 


6.1. 


6.2. 


6.3. 


6.4. 


6.5. 


6.6. 


The point X = (X,Y, Z) is uniformly distributed inside a sphere of radius 1 about the 
origin. Find the probability of the following events: 


(a) Xis inside a sphere of radius r, r > 0. 

(b) X is inside a cube of length 2/ V3 centered about the origin. 

(c) All components of X are positive. 

(d) Z is negative. 

A random sinusoid signal is given by X(t) = A sin(t) where A is a uniform random vari- 


able in the interval [0,1]. Let X = (X(t,), X(t), X(t3)) be samples of the signal taken at 
times ģ, h, and £3. 
(a) Find the joint cdf of X in terms of the cdf of A if ti = 0, = 7/2, and t = m. Are 
X(t), X(t), X(t3) independent random variables? 
(b) Find the joint cdf of X for 4, = t + 7/2, and t = ti + m. Lett = 7/6. 
Let the random variables X, Y, and Z be independent random variables. Find the follow- 
ing probabilities in terms of Fy(x), Fy(y), and Fz(z). 
(a) P[|X| <5,¥ <4,Z? > 8]. 
(b) P[/X =5,Y <0,Z > 1]. 
(o) P[min(X,Y,Z) < 2]. 
(d) P[max(X,Y,Z) > 6]. 
A radio transmitter sends a signal s > 0 to a receiver using three paths. The signals that 
arrive at the receiver along each path are: 
X,=st+N,,X,= 5 + Ny, and X; = s + N, 
where N,, N2, and N; are independent Gaussian random variables with zero mean and 
unit variance. 


(a) Find the joint pdf of X = (X1, X2, X3). Are X,, X2, and X; independent random 
variables? 


(b) Find the probability that the minimum of all three signals is positive. 
(c) Find the probability that a majority of the signals are positive. 
An urn contains one black ball and two white balls. Three balls are drawn from the urn. 
Let J, = 1 if the outcome of the kth draw is the black ball and let J, = 0 otherwise. Define 
the following three random variables: 
X=] 1 +1 2 +1 35 
Y= min{/,, I, T3}, 
Z= max{ 4, l, T3}. 
(a) Specify the range of values of the triplet (X, Y, Z) if each ball is put back into the urn 
after each draw; find the joint pmf for (X, Y, Z). 
(b) In part a, are X, Y, and Z independent? Are X and Y independent? 
(c) Repeat part a if each ball is not put back into the urn after each draw. 


Consider the packet switch in Example 6.1. Suppose that each input has one packet with 
probability p and no packets with probability 1 — p. Packets are equally likely to be 


6.7. 


6.8. 


6.9. 
6.10. 


6.11. 
6.12. 


6.13. 


6.14. 
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destined to each of the outputs. Let X1, X, and X; be the number of packet arrivals des- 
tined for output 1, 2, and 3, respectively. 


(a) Find the joint pmf of X;, X2, and X; Hint: Imagine that every input has a packet go 
to a fictional port 4 with probality 1 — p. 


(b) Find the joint pmf of X; and X3. 
(c) Find the pmf of X2. 
(d) Are X,, X2, and X; independent random variables? 


(e) Suppose that each output will accept at most one packet and discard all additional 
packets destined to it. Find the average number of packets discarded by the module 
in each T-second period. 


Let X, Y, Z have joint pdf 


fyyz(% yz) =k(xt+yt+z) for 0sxs1,0sys1,0sz21. 


(a) Find k. 

(b) Find fx(x| y, z) and fz(z |x, y). 

(c) Find fx(x), fr), and fz(2). 

A point X = (X,Y, Z) is selected at random inside the unit sphere. 

(a) Find the marginal joint pdf of Y and Z. 

(b) Find the marginal pdf of Y. 

(c) Find the conditional joint pdf of X and Y given Z. 

(d) Are X,Y, and Z independent random variables? 

(e) Find the joint pdf of X given that the distance from X to the origin is greater than 1/2 
and all the components of X are positive. 

Show that py, x, x,(%15 X2, X3) = px,(x3| x1, x2) px,(%2 | 41) px,(x1). 

Let X,, X>,..., X, be binary random variables taking on values 0 or 1 to denote whether 

a speaker is silent (0) or active (1). A silent speaker remains idle at the next time slot with 

probability 3/4, and an active speaker remains active with probability 1/2. Find the joint 

pmf for X1, X2, X3, and the marginal pmf of X3. Assume that the speaker begins in the 

silent state. 

Show that fy y z(x, y, z) = fz(zlx, y)fy(y |x)fx(x). 

Let U;, U2, and U; be independent random variables and let X = U1, Y = U, + U», and 

Z =U, + U, + U3. 

(a) Use the result in Problem 6.11 to find the joint pdf of X, Y, and Z. 


(b) Let the U; be independent uniform random variables in the interval [0, 1]. Find the 
marginal joint pdf of Y and Z. Find the marginal pdf of Z. 


(c) Let the U; be independent zero-mean, unit-variance Gaussian random variables. 
Find the marginal pdf of Y and Z. Find the marginal pdf of Z. 


Let X1, X2, and X; be the multiplicative sequence in Example 6.7. 
(a) Find, plot, and compare the marginal pdfs of X1, X2, and X3. 
(b) Find the conditional pdf of X; given X; = x. 
(c) Find the conditional pdf of X; given X; = z. 


Requests at an online music site are categorized as follows: Requests for most popular 
title with p, = 1/2; second most popular title with p) = 1/4; third most popular title with 
p = 1/8; and other py = 1 — pı — po — p; = 1/8. Suppose there are a total number of 
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6.15. 


6.16. 


6.17. 


Vector Random Variables 


n requests in T seconds. Let X, be the number of times category k occurs. 

(a) Find the joint pmf of (X1, X2, X3). 

(b) Find the marginal pmf of (X1, X2). Hint: Use the binomial theorem. 

(c) Find the marginal pmf of X4. 

(d) Find the conditional joint pmf of (X2, X3) given X, = m, where 0 = m =n. 

The number N of requests at the online music site in Problem 6.14 is a Poisson random 


variable with mean a customers per second. Let X, be the number of type k requests in 
T seconds. Find the joint pmf of (X1, X2, X3, X4). 


A random experiment has four possible outcomes. Suppose that the experiment is re- 
peated n independent times and let X, be the number of times outcome k occurs. The 
joint pmf of (X1, X2, X3) is given by 


plk, ko, k3) = —— = 3 


n! 3! n+3 
(n + 3)! 


=] 
) for 0 < k;and kı + ko + kz S n. 


(a) Find the marginal pmf of (X1, X2). 

(b) Find the marginal pmf of X4. 

(c) Find the conditional joint pmf of (X7, X3) given X; = m, where 0 < m = n. 

The number of requests of types 1, 2, and 3, respectively, arriving at a service station in 
t seconds are independent Poisson random variables with means Àt, Azt, and Azt. Let 
N,, N2, and N; be the number of requests that arrive during an exponentially distributed 
time T with mean at. 


(a) Find the joint pmf of N,, N2, and M3. 
(b) Find the marginal pmf of N,. 
(c) Find the conditional pmf of N; and N», given M3. 


Section 6.2: Functions of Several Random Variables 


6.18. 


6.19. 


6.20. 


6.21. 


N devices are installed at the same time. Let Y be the time until the first device fails. 

(a) Find the pdf of Y if the lifetimes of the devices are independent and have the same 
Pareto distribution. 

(b) Repeat part a if the device lifetimes have a Weibull distribution. 

In Problem 6.18 let /,(t) be the indicator function for the event “kth device is still work- 

ing at time t.” Let M(t) be the number of devices still working at time t: N(t) = I(t) + 

I(t) + +++ + Iy(t). Find the pmf of N(t) as well as its mean and variance. 

A diversity receiver receives N independent versions of a signal. Each signal version has 

an amplitude X, that is Rayleigh distributed. The receiver selects that signal with the 

largest amplitude X; . A signal is not useful if the squared amplitude falls below a thresh- 

old y. Find the probability that all N signals are below the threshold. 

(Haykin) A receiver in a multiuser communication system accepts K binary signals from 


K independent transmitters: Y = (Y,, Y2,..., Yx), where Y, is the received signal from 
the kth transmitter. In an ideal system the received vector is given by: 

Y=Ab+N 
where A = [a;] is a diagonal matrix of positive channel gains, b = (b4, by,..., bx) is 


the vector of bits from each of the transmitters where b, = +1, and N is a vector of K 


6.22. 


6.23. 


6.24. 


6.25. 
6.26. 


6.27. 


6.28. 


6.29. 
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independent zero-mean, unit-variance Gaussian random variables. 

(a) Find the joint pdf of Y. 

(b) Suppose b = (1, 1,..., 1), find the probability that all components of Y are positive. 

(a) Find the joint pdf of U = X,,V = X, + X,,andW = X, + X + X. 

(b) Evaluate the joint pdf of (U, V, W) if the X; are independent zero-mean, unit vari- 
ance Gaussian random variables. 

(c) Find the marginal pdf of V and of W. 

(a) Find the joint pdf of the sample mean and variance of two random variables: 


-AER y (X, — M? + (X, — M? 


2 2 


in terms of the joint pdf of X; and X3. 


(b) Evaluate the joint pdf if X, and X, are independent Gaussian random variables with 
the same mean 1 and variance 1. 


(c) Evaluate the joint pdf if X; and X, are independent exponential random variables 
with the same parameter 1. 


(a) Use the auxiliary variable method to find the pdf of 
o xX 
X+Y 
(b) Find the pdf of Z if X and Y are independent exponential random variables with the 
parameter 1. 


(c) Repeat part b if X and Y are independent Pareto random variables with parameters 
k =2andx,, = 1. 
Repeat Problem 6.24 parts a and b for Z = X/Y. 


Let X and Y be zero-mean, unit-variance Gaussian random variables with correlation co- 
efficient 1/2. Find the joint pdf of U = X? and V = Y*. 


Use auxilliary variables to find the pdf of Z = X,X,X3 where the X; are independent 
random variables that are uniformly distributed in [0, 1]. 


Let X, Y, and Z be independent zero-mean, unit-variance Gaussian random variables. 
(a) Find the pdf of R= (X? + Y? + ZA". 
(b) Find the pdf of R? = X? + Y? + 2°. 
Let X1, X2, X3, X4 be processed as follows: 
Y, = X,Y. = X + X, V3 = X2 + Aa Yy = X3 + X4. 


(a) Find an expression for the joint pdf of Y = (Yj, Y2, Y3, Y4) in terms of the joint pdf 
of X = (X1, X2, X3, X4). 

(b) Find the joint pdf of Y if X,, X2, X3, X4 are independent zero-mean, unit-variance 
Gaussian random variables. 


Section 6.3: Expected Values of Vector Random Variables 


6.30. 
6.31. 


Find E[M], E[V], and E[MV] in Problem 6.23c. 
Compute E[Z] in Problem 6.27 in two ways: 

(a) by integrating over fz(z); 

(b) by integrating over the joint pdf of (X1, X2, X3). 
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6.33. 


6.34. 


6.35. 
6.36. 


6.37. 


6.38. 


6.39. 


6.40. 


6.41. 


6.42. 
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Find the mean vector and covariance matrix for three multipath signals X = (X1, X2, X3) 
in Problem 6.4. 


Find the mean vector and covariance matrix for the samples of the sinusoidal signals 
X = (X(t), X(t), X(t,)) in Problem 6.2. 

(a) Find the mean vector and covariance matrix for (X, Y, Z) in Problem 6.5a. 

(b) Repeat part a for Problem 6.5c. 

Find the mean vector and covariance matrix for (X, Y, Z) in Problem 6.7. 


Find the mean vector and covariance matrix for the point (X, Y, Z) inside the unit sphere 
in Problem 6.8. 


(a) Use the results of Problem 6.6c to find the mean vector for the packet arrivals 
X,, X2, and X; in Example 6.5. 


(b) Use the results of Problem 6.6b to find the covariance matrix. 
(c) Explain why X,, X2, and X; are correlated. 


Find the mean vector and covariance matrix for the joint number of packet arrivals in a 
random time N,, N2, and N; in Problem 6.17. Hint: Use conditional expectation. 


(a) Find the mean vector and covariance matrix (U, V, W) in terms of (X1, X2, X3) in 
Problem 6.22b. 


(b) Find the cross-covariance matrix between (U, V, W) and (X1, X2, X3). 


(a) Find the mean vector and covariance matrix of Y = (Y1, Y2, Y3, Y4) in terms of 
those of X = (X1, X2, X3, X4) in Problem 6.29. 


(b) Find the cross-covariance matrix between Y and X. 

(c) Evaluate the mean vector, covariance, and cross-covariance matrices if X,, X2, X3, X4 
are independent random variables. 

(d) Generalize the results in part c to Y = (Y1, Yo,..., Y,-1, Yn). 

Let X = (X1, X, X3, X4) consist of equal mean, independent, unit-variance random 

variables. Find the mean vector, covariance, and cross-covariance matrices of Y = AX: 


[1 12 1/4 1/8 

gal 0 1 12 1⁄4 
Rg o a AB 
o 0 0 1 

[1 1 1 1 

1 -1 1 -1 

Oe 1 1 -1 -1 
[1 -1 -1 1 


Let W = aX + bY + c, where X and Y are random variables. 


(a) Find the characteristic function of W in terms of the joint characteristic function of 
X and Y. 


(b) Find the characteristic function of W if X and Y are the random variables discussed 
in Example 6.19. Find the pdf of W. 


6.43. 


6.44. 


6.45. 


6.46. 


6.47. 


6.48. 
6.49. 
6.50. 


6.51. 
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(a) Find the joint characteristic function of the jointly Gaussian random variables X and 
Y introduced in Example 5.45. Hint: Consider X and Y as a transformation of the in- 
dependent Gaussian random variables V and W. 


(b) Find E[X?Y]. 
(c) Find the joint characteristic function of X "=X +aandY'=Y+b. 
Let X = aU + bV and y = cU + dV, where lad — bc| + 0. 


(a) Find the joint characteristic function of X and Y in terms of the joint characteristic 
function of U and V. 


(b) Find an expression for ELXY] in terms of joint moments of U and V. 


Let X and Y be nonnegative, integer-valued random variables. The joint probability gen- 
erating function is defined by 


Gy y(Z1, 22) = Eze z% | T YD z$ P[X=j,Y =k]. 


(a) Find the joint pgf for two independent Poisson random variables with parameters a, 
and ap. 

(b) Find the joint pgf for two independent binomial random variables with parameters 
(n, p) and (m, p). 

Suppose that X and Y have joint pgf 


Gy y(Z15 22) = e% (z171) +a(z2=1)+8(z1z271), 


(a) Use the marginal pgf’s to show that X and Y are Poisson random variables. 
(b) Find the pef of Z = X + Y. Is Z a Poisson random variable? 
Let X and Y be trinomial random variables with joint pmf 

n! ia — pr — pp)" 1 * 


PX =j,Y =k] = GE fas for O<j,kandj+k<n. 


(a) Find the joint pgf of X and Y. 

(b) Find the correlation and covariance of X and Y. 

Find the mean vector and covariance matrix for (X, Y) in Problem 6.46. 
Find the mean vector and covariance matrix for (X, Y) in Problem 6.47. 
Let X = (X1, X2) have covariance matrix: 


EAN E 
e N 


(a) Find the eigenvalues and eigenvectors of Ky. 

(b) Find the orthogonal matrix P that diagonalizes Kx. Verify that P is orthogonal and 
that P™KxP = A. 

(c) Express X in terms of the eigenvectors of Kx using the Karhunen-Loeve expansion. 

Repeat Problem 6.50 for X = (X1, X2, X3) with covariance matrix: 


1 =1/2; “=1/2 
Kx = | —1/2 1 —1/2 |. 
=1/2 —-1/2 1 
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A square matrix A is said to be nonnegative definite if for any vector a = (aj,q, 

.-+;4,)': aTA a = 0. Show that the covariance matrix is nonnegative definite. Hint: Use 

the fact that E[(a™(X — mx))*] = 0. 

A is positive definite if for any nonzero vector a = (a4, a,...,4,)':a'Aa > 0. 

(a) Show that if all the eigenvalues are positive, then Ky is positive definite. Hint: Let 
b = Pla. 

(b) Show that if Kx is positive definite, then all the eigenvalues are positive. Hint: Let a 
be an eigenvector of Kx. 


Section 6.4: Jointly Gaussian Random Vectors 


6.54. 


6.55. 


6.56. 


6.57. 


Let X = (Xj, X2) be the jointly Gaussian random variables with mean vector and covariance 


matrix given by: 
ce 1 K. = 3/2 —1/2 
Aa O Pl ete 32 | 


(a) Find the pdf of X in matrix notation. 

(b) Find the pdf of X using the quadratic expression in the exponent. 

(c) Find the marginal pdfs of X; and X. 

(d) Find a transformation A such that the vector Y = AX consists of independent 
Gaussian random variables. 

(e) Find the joint pdf of Y. 

Let X = (X1, X2, X3) be the jointly Gaussian random variables with mean vector and 

covariance matrix given by: 


1 3/2 0 1/2 
myx = 0 Kx = 0 1 0 
2 1⁄2 0 3/2 


(a) Find the pdf of X in matrix notation. 

(b) Find the pdf of X using the quadratic expression in the exponent. 

(c) Find the marginal pdfs of X1, X2, and X3. 

(d) Find a transformation A such that the vector Y = AX consists of independent 
Gaussian random variables. 

(e) Find the joint pdf of Y. 

Let U,, U2, and U3 be independent zero-mean, unit-variance Gaussian random variables 

and let X = U, Y = U, + U,, and Z = U, + U + Uz. 

(a) Find the covariance matrix of (X, Y, Z). 

(b) Find the joint pdf of (X, Y, Z). 

(c) Find the conditional pdf of Y and Z given X. 

(d) Find the conditional pdf of Z given X and Y. 


Let X,, X2, X3, X4 be independent zero-mean, unit-variance Gaussian random variables 
that are processed as follows: 


Y= X, + X,Y = X + X,Y = X; + X4. 


(a) Find the covariance matrix of Y = (Y;, Ys, Y3). 

(b) Find the joint pdf of Y. 

(c) Find the joint pdf of Y, and Y,; Y, and Y}. 

(d) Find a transformation A such that the vector Z = AY consists of independent 
Gaussian random variables. 


6.58. 


6.59. 


6.60. 


6.61. 


6.62. 
6.63. 


6.64. 


6.65. 


6.66. 
6.67. 
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A more realistic model of the receiver in the multiuser communication system in Prob- 
lem 6.21 has the K received signals Y = (Y;, Yo,..., Yx) given by: 


Y = ARb+N 
where A = [a,] is a diagonal matrix of positive channel gains, R is a symmetric matrix 
that accounts for the interference between users, and b = (b4, b2,..., bx) is the vector of 


bits from each of the transmitters. N is the vector of K independent zero-mean, unit-variance 

Gaussian noise random variables. 

(a) Find the joint pdf of Y. 

(b) Suppose that in order to recover b, the receiver computes Z = (AR) tY. Find the 
joint pdf of Z. 

(a) Let K; be the covariance matrix in Problem 6.55. Find the corresponding Q, and Q3 
in Example 6.23. 

(b) Find the conditional pdf of X; given X; and X}. 


In Example 6.23, show that: 


(Xp T m,)"Q,(x„, m,) (Xn m,—-1)'Q,-1(%n—-1 T m,,-1) 


z Onn{ (Xn g My) T By? Ta OnnB? 


1 n-1 
where B = g, oie —m) and |K,|/|K,-1] = Qnn- 


nn j= 


Find the pdf of the sum of Gaussian random variables in the following cases: 

(a) Z = X, + X, + X3 in Problem 6.55. 

(b) Z= X + Y + Zin Problem 6.56. 

(©) Z=Y, + Y, + Y3in Problem 6.57. 

Find the joint characteristic function of the jointly Gaussian random vector X in Problem 6.54. 


Suppose that a jointly Gaussian random vector X has zero mean vector and the covari- 
ance matrix given in Problem 6.51. 
(a) Find the joint characteristic function. 
(b) Can you obtain an expression for the joint pdf? Explain your answer. 
Let X and Y be jointly Gaussian random variables. Derive the joint characteristic func- 
tion for X and Y using conditional expectation. 
Let X = (X1, X2,..., Xn) be jointly Gaussian random variables. Derive the characteris- 
tic function for X by carrying out the integral in Eq. (6.32). Hint: You will need to com- 
plete the square as follows: 

(x — j/Kw)™K (x — jKw) = x'K x — 2jxTw + jo Ko. 
Find E[X°Y’] for jointly Gaussian random variables from the characteristic function. 


Let X = (X1, X2, X3, X4) be zero-mean jointly Gaussian random variables. Show that 
E[ Xi X2X;X4] = E[ XX ]E[ XX4] + E[X1X3]E[ XX4] + E[X1X4]E[ XX3]. 


Section 6.5: Mean Square Estimation 


6.68. 


Let X and Y be discrete random variables with three possible joint pmf’s: 


(i) (ii) (iti) 
xXI¥-1 01 XY-1 0 1 XxX/Y-10 1 
-1 16160 -1 1/9 1/91/99 -1 13 0 0 
0 0013 0 191919 0 0130 
116160 1 191/919 1 0 0 11 
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6.69. 
6.70. 
6.71. 


6.72. 


6.73. 


6.74. 


6.75. 


6.76. 
6.77. 


Vector Random Variables 


(a) Find the minimum mean square error linear estimator for Y given X. 

(b) Find the minimum mean square error estimator for Y given X. 

(c) Find the MAP and ML estimators for Y given X. 

(d) Compare the mean square error of the estimators in parts a, b, and c. 

Repeat Problem 6.68 for the continuous random variables X and Y in Problem 5.26. 
Find the ML estimator for the signal s in Problem 6.4. 


Let N, be the number of Web page requests arriving at a server in the period (0, 100) ms 
and let N, be the total combined number of Web page requests arriving at a server in the 
period (0, 200) ms. Assume page requests occur every 1-ms interval according to inde- 
pendent Bernoulli trials with probability of success p. 


(a) Find the minimum linear mean square estimator for N, given N; and the associated 
mean square error. 


(b) Find the minimum mean square error estimator for N, given N; and the associated 
mean square error. 


(c) Find the maximum a posteriori estimator for M, given N,. 
(d) Repeat parts a, b, and c for the estimation of N; given N). 


Let Y = X + N where X and N are independent Gaussian random variables with dif- 
ferent variances and N is zero mean. 


(a) Plot the correlation coefficient between the “observed signal” Y and the “desired 
signal” X as a function of the signal-to-noise ratio o y/o y. 


(b) Find the minimum mean square error estimator for X given Y. 

(c) Find the MAP and ML estimators for X given Y. 

(d) Compare the mean square error of the estimators in parts a, b and c. 

Let X, Y, Z be the random variables in Problem 6.7. 

(a) Find the minimum mean square error linear estimator for Y given X and Z. 

(b) Find the minimum mean square error estimator for Y given X and Z. 

(c) Find the MAP and ML estimators for Y given X and Z. 

(d) Compare the mean square error of the estimators in parts b and c. 

(a) Repeat Problem 6.73 for the estimator of X>, given X, and X; in Problem 6.13. 
(b) Repeat Problem 6.73 for the estimator of X, given X, and X7. 


Consider the ideal multiuser communication system in Problem 6.21. Assume the trans- 
mitted bits b, are independent and equally likely to be +1 or —1. 


(a) Find the ML and MAP estimators for b given the observation Y. 


(b) Find the minimum mean square linear estimator for b given the observation Y. How 
can this estimator be used in deciding what were the transmitted bits? 


Repeat Problem 6.75 for the multiuser system in Problem 6.58. 


A second-order predictor for samples of an image predicts the sample E as a linear func- 
tion of sample D to its left and sample B in the previous line, as shown below: 


line j yas A B Cisse 
linej + 1 aes D E 
Estimate for E = aD + bB. 
(a) Find a and b if all samples have variance g? and if the correlation coefficient be- 
tween D and E is p, between B and E is p, and between D and B is p°. 
(b) Find the mean square error of the predictor found in part a, and determine the reduc- 
tion in the variance of the signal in going from the input to the output of the predictor. 


6.78. 
6.79. 


Problems 357 


Show that the mean square error of the two-tap linear predictor is given by Eq. (6.64). 
In “hexagonal sampling” of an image, the samples in consecutive lines are offset relative 
to each other as shown below: 

line j oe A B 

line j + 1 nye C D 


The covariance between two samples a and b is given by p“‘“*) where d(a, b) is the Eu- 
clidean distance between the points. In the above samples, the distance between A and B, 
A and C, A and D, C and D, and B and D is 1. Suppose we wish to use a two-tap linear 
predictor to predict the sample D. Which two samples from the set {A, B, C} should we 
use in the predictor? What is the resulting mean square error? 


*Section 6.6: Generating Correlated Vector Random Variables 


6.80. 


6.81. 


6.82. 


6.83. 


6.84. 


6.85. 
6.86. 


6.87. 


6.88. 


Find a linear transformation that diagonalizes K. 


4 1 
(b) K-i a 


Generate and plot the scattergram of 1000 pairs of random variables Y with the covari- 
ance matrices in Problem 6.80 if: 


(a) X, and X, are independent random variables that are each uniform in the unit 
interval; 

(b) X, and X, are independent zero-mean, unit-variance Gaussian random variables. 

Let X = (X1, X2, X3) be the jointly Gaussian random variables in Problem 6.55. 

(a) Find a linear transformation that diagonalizes the covariance matrix. 


(b) Generate 1000 triplets of Y = AX and plot the scattergrams for Y, and Yz, Y, and 
Yz, and Y, and Y}. Confirm that the scattergrams are what is expected. 


Let X be a jointly Gaussian random vector with mean my and covariance matrix Ky and 
let A be a matrix that diagonalizes Ky. What is the joint pdf of A(X — mx)? 


Let X,, X2,..., Xn be independent zero-mean, unit-variance Gaussian random variables. 
Let Y, = (Xp + X,_1)/2, that is, Y, is the moving average of pairs of values of X. Assume 
X1,=0= X,- 

(a) Find the covariance matrix of the Y;,’s. 


(b) Use Octave to generate a sequence of 1000 samples Y,,..., Y,,. How would you 
check whether the Y,’s have the correct covariances? 


Repeat Problem 6.84 with Y, = X, — X;_1. 

Let U be an orthogonal matrix. Show that if A diagonalizes the covariance matrix K, then 
B = UA also diagonalizes K. 

The transformation in Problem 6.56 is said to be “causal” because each output depends 
only on “past” inputs. 

(a) Find the covariance matrix of X, Y, Z in Problem 6.56. 

(b) Find a noncausal transformation that diagonalizes the covariance matrix in part a. 
(a) Find acausal transformation that diagonalizes the covariance matrix in Problem 6.54. 
(b) Repeat for the covariance matrix in Problem 6.55. 
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Vector Random Variables 


Problems Requiring Cumulative Knowledge 


6.89. 


6.90. 


6.91. 


6.92. 


6.93. 


Let Ub, U,,... be a sequence of independent zero-mean, unit-variance Gaussian ran- 
dom variables. A “low-pass filter” takes the sequence U; and produces the output se- 
quence X,, = (U, + U,_,)/2, and a “high-pass filter” produces the output sequence 
Y, = (Un = U,-1)/2. 

(a) Find the joint pdf of X,,,;, Xn, and X,_1; of Xn, Xn+m and X,42m,m > 1. 

(b) Repeat part a for Y,,. 

(c) Find the joint pdf of X,,, Xm, Y,, and Y,,. 

(d) Find the corresponding joint characteristic functions in parts a, b, and c. 


Let X1, X2,..., Xn be the samples of a speech waveform in Example 6.31. Suppose we 
want to interpolate for the value of a sample in terms of the previous and the next sam- 
ples, that is, we wish to find the best linear estimate for X in terms of X, and X3. 


(a) Find the coefficients of the best linear estimator (interpolator). 


(b) Find the mean square error of the best linear interpolator and compare it to the 
mean square error of the two-tap predictor in Example 6.31. 


(c) Suppose that the samples are jointly Gaussian. Find the pdf of the interpolation error. 


Let X1, X5,..., Xn be samples from some signal. Suppose that the samples are jointly 
Gaussian random variables with covariance 
o fori=j 
COV(X;, X;) = § po? for |i — j| =1 
0 otherwise. 


Suppose we take blocks of two consecutive samples to form a vector X, which is then lin- 
early transformed to form Y = AX. 


(a) Find the matrix A so that the components of Y are independent random variables. 


(b) Let X; and X;,,, be two consecutive blocks and let Y; and Y;,, be the corresponding 
transformed variables. Are the components of Y; and Y;,, independent? 


A multiplexer combines N digital television signals into a common communications line. 
TV signal n generates X, bits every 33 milliseconds, where X,, is a Gaussian random vari- 
able with mean m and variance o°. Suppose that the multiplexer accepts a maximum 
total of T bits from the combined sources every 33 ms, and that any bits in excess of T are 
discarded. Assume that the N signals are independent. 


(a) Find the probability that bits are discarded in a given 33-ms period, if we let 
T = m, + to, where m, is the mean total bits generated by the combined sources, and o 
is the standard deviation of the total number of bits produced by the combined sources. 


(b) Find the average number of bits discarded per period. 
(c) Find the long-term fraction of bits lost by the multiplexer. 


(d) Find the average number of bits per source allocated in part a, and find the average 
number of bits lost per source. What happens as N becomes large? 


(e) Suppose we require that ¢ be adjusted with N so that the fraction of bits lost per 
source is kept constant. Find an equation whose solution yields the desired value of t. 


(f) Do the above results change if the signals have pairwise covariance p? 
Consider the estimation of T given N; and arrivals in Problem 6.17. 

(a) Find the ML and MAP estimators for T. 

(b) Find the linear mean square estimator for T. 

(c) Repeat parts a and b if N; and N, are given. 


CHAPTER 


Sums of Random 
Variables and 
Long-Term Averages 


Many problems involve the counting of the number of occurrences of events, the 
measurement of cumulative effects, or the computation of arithmetic averages in 
a series of measurements. Usually these problems can be reduced to the problem 
of finding, exactly or approximately, the distribution of a random variable that 
consists of the sum of n independent, identically distributed random variables. In 
this chapter, we investigate sums of random variables and their properties as n 
becomes large. 

In Section 7.1, we show how the characteristic function is used to compute the 
pdf of the sum of independent random variables. In Section 7.2, we discuss the sample 
mean estimator for the expected value of a random variable and the relative frequen- 
cy estimator for the probability of an event. We introduce measures for assessing the 
goodness of these estimators. We then discuss the laws of large numbers, which are the- 
orems that state that the sample mean and relative frequency estimators converge to 
the corresponding expected values and probabilities as the number of samples is in- 
creased. These theoretical results demonstrate the remarkable consistency between 
probability theory and observed behavior, and they reinforce the relative frequency in- 
terpretation of probability. 

In Section 7.3, we present the central limit theorem, which states that, under very 
general conditions, the cdf of a sum of random variables approaches that of a Gaussian 
random variable even though the cdf of the individual random variables may be far 
from Gaussian. This result enables us to approximate the pdf of sums of random vari- 
ables by the pdf of a Gaussian random variable. The result also explains why the 
Gaussian random variable appears in so many diverse applications. 

In Section 7.4 we consider sequences of random variables and their conver- 
gence properties. In Section 7.5 we discuss random experiments in which events 
occur at random times. In these experiments we are interested in the average rate at 
which events occur as well as the rate at which quantities associated with the events 
grow. Finally, Section 7.6 introduces computer methods based on the discrete Fourier 
transform that prove very useful in the numerical calculation of pmf’s and pdf’s from 
their transforms. 
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SUMS OF RANDOM VARIABLES 
Let X1, X,..., X, be a sequence of random variables, and let S,, be their sum: 

S, = X, + Xp +++ + Xn. (7.1) 
In this section, we find the mean and variance of S,,, as well as the pdf of S,, in the im- 
portant special case where the X;’s are independent random variables. 
Mean and Variance of Sums of Random Variables 


In Section 6.3, it was shown that regardless of statistical dependence, the expected value 
of asum of n random variables is equal to the sum of the expected values: 


EL X, + X +-+ Xn] = ELX 1] + + EX]. (7.2) 
Thus knowledge of the means of the X;’s suffices to find the mean of $,. 


The following example shows that in order to compute the variance of a sum of 
random variables, we need to know the variances and covariances of the X;’s. 


Example 7.1 


Find the variance of Z = X + Y. 
From Eq. (7.2), E[Z] = E[X + Y] = E[X] + E[Y]. The variance of Z is therefore 


VAR(Z) = E[(Z - E[Z])"] = E[(X + Y - E[X] - E[Y])’] 
H(X - E[X]) + (Y - E[Y])}7] 
? + (Y EY] + (X - EX YY - E[Y]) 
+ (Y = E[Y])(X - E[X])] 
= VAR[X] + VAR[Y] + COV(X,Y) + COV(Y, X) 
= VAR[X] + VAR[Y] + 2 COV(X,Y). 


In general, the covariance COV(X, Y) is not equal to zero, so the variance of a sum is not neces- 
sarily equal to the sum of the individual variances. 


The result in Example 7.1 can be generalized to the case of n random variables: 


VAR(X, + X% +-+ X,) = ef S = EX) D(X = Eix} 


— 
ll 
an 


= Š VAR(X;) + S 5 COv(x,, X,). (7.3) 
k=1 j 


yr 
ll 
an 


Thus in general, the variance of a sum of random variables is not equal to the sum 
of the individual variances. 
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An important special case is when the X;’s are independent random variables. If 
X,, Xz,..., Xn are independent random variables, then COV(X;, Xx) = 0 for j # k 
and 


VAR(X, + Xp +- + X,) = VAR(X;) +- + VAR(X,). (7.4) 


Example 7.2 Sum of iid Random Variables 


Find the mean and variance of the sum of n independent, identically distributed (iid) random 
variables, each with mean pw and variance o°. 
The mean of S,, is obtained from Eq. (7.2): 


E[S,] = ELX,] +--+: + E[X,] = mp. 


The covariance of pairs of independent random variables is zero, so by Eq. (7.4), 
VAR[S,] = n VAR[X;] = no”, 
since VAR[ Xj] = ø’ for j = 1,...,n. 


pdf of Sums of Independent Random Variables 


Let Xi, X5,..., X, be n independent random variables. In this section we show how 
transform methods can be used to find the pdf of S, = X, + X% +- + Xn: 

First, consider the n = 2 case, Z = X + Y, where X and Y are independent ran- 
dom variables. The characteristic function of Z is given by 


x(@)Py(w), (7.5) 


where the fourth equality follows from the fact that functions of independent random 
variables (i.e., e/’* and eY) are also independent random variables, as discussed in 
Example 5.25. Thus the characteristic function of Z is the product of the individual 
characteristic functions of X and Y. 

In Example 5.39, we saw that the pdf of Z = X + Y is given by the convolution 
of the pdf’s of X and Y: 


f2lz) = f(x) *fr(y). (7.6) 
Recall that ® z(w) can also be viewed as the Fourier transform of the pdf of Z: 
z(o) = Atfz(z)}- 
By equating the transform of Eq. (7.6) to Eq. (7.5) we obtain 
Dzlo) = Afz(z)} = Afx(*) *fr(y)t = Ox(@) Pyle). (7.7) 
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Equation (7.7) states the well-known result that the Fourier transform of a convolution 
of two functions is equal to the product of the individual Fourier transforms. 
Now consider the sum of n independent random variables: 


S, = Xy + Xp tee t KX. 
The characteristic function of S$, is 
Ds (w) = Eel] = Eliot Xt + Xn) 
Elel@*]... Efele*+] 
= Oy (w)... Py (w). (7.8) 


Thus the pdf of S,, can then be found by finding the inverse Fourier transform of the 
product of the individual characteristic functions of the Xs. 


fs,(X) = FMNPy(w)...By,(w)}. (7.9) 


Example 7.3 Sum of Independent Gaussian Random Variables 


Let S,, be the sum of n independent Gaussian random variables with respective means and 
variances, ,..., M, and oj,..., 0%. Find the pdf of S,. 
The characteristic function of X% is 


Py (w) = etiom eo, 
k 


so by Eq. (7.8), 


n 
a perie 
=1 


= exp{tjo(m, +++: + my) a (oi fee 4 o*)/2} 


This is the characteristic function of a Gaussian random variable. Thus S, is a Gaussian random 
variable with mean m; + --- + m, and variance oj + -:: + 02. 


Example 7.4 Sum of iid Random Variables 


Find the pdf of a sum of n independent, identically distributed random variables with character- 
istic functions 


Py,(w) = ® y(@) fork = 1,...,7. 
Equation (7.8) immediately implies that the characteristic function of S$, is 
Ps (a) = {Px(o)}". (7.10) 


The pdf of S,, is found by taking the inverse transform of this expression. 
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Example 7.5 Sum of iid Exponential Random Variables 


Find the pdf of a sum of n independent exponentially distributed random variables, all with 
parameter a. 
The characteristic function of a single exponential random variable is 


From the previous example we then have that 


Ps (œo) = l- 2 


From Table 4.1, we see that S,, is an m-Erlang random variable. 


When dealing with integer-valued random variables it is usually preferable to 
work with the probability generating function 


Gy(z) = E[z^]. 


The generating function for a sum of independent discrete random variables, 
N =X ++ Xn, is 


= Gy,(z)...Gy,(z). (7.11) 


Example 7.6 


Find the generating function for a sum of n independent, identically geometrically distributed 
random variables. 
The generating function for a single geometric random variable is given by 


_ “PA 
1- qz` 


Gx(z) 
Therefore the generating function for a sum of n such independent random variables is 


Cn(z) = f A 


From Table 3.1, we see that this is the generating function of a negative binomial random variable 
with parameters p and n. 
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*7.1.3 Sum of a Random Number of Random Variables 


In some problems we are interested in the sum of a random number N of iid random 
variables: 


N 
Sy = 2X (7.12) 
=1 


where N is assumed to be a random variable that is independent of the X;’s. For 
example, N might be the number of computer jobs submitted in an hour and Xx 
might be the time required to execute the kth job. 

The mean of Sy is found readily by using conditional expectation: 


E(Sy] = E[E[SyIN]]. 
= E[NE[X]] 
= E[N]E[X]. (7.13) 


The second equality follows from the fact that 


E[Sy|N =n] = a| Sx. | = nE[X], 


so E[Sy|N] = NE[X]. 
The characteristic function of S„ can also be found by using conditional expecta- 
tion. From Eq. (7.10), we have that 


E[eSy | N = n] = Elle it +X] = Oy(w)", 
so 
Efes] N] = xlo)”. 


Therefore 


= Gy(®x(w)). (7.14) 


That is, the characteristic function of Sy is found by evaluating the generating function 
of Natz = x(w). 


Example 7.7 


The number of jobs N submitted to a computer in an hour is a geometric random variable with 
parameter p, and the job execution times are independent exponentially distributed random 
variables with mean 1/a. Find the pdf for the sum of the execution times of the jobs submitted in 
an hour. 
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The generating function for N is 


= P 
1- qz’ 


Gy(z) 


and the characteristic function for an exponentially distributed random variable is 


a 


® x(w) = 


a — jo 
From Eq. (7.14), the characteristic function of Sy is 


P 
1 — qla/(a — jw)] 

= p(a — jw)/(pa — jo) 
pa 

pa — jo 


Ps (w) 


=p+(1- p) 
The pdf of Sy is found by taking the inverse transform of the above expression: 


fs,(x) = p6(x) + (1 — p)paeP?* x20. 


The pdf has a direct interpretation: With probability p there are no job arrivals and hence the 
total execution time is zero; with probability (1 — p) there are one or more arrivals, and the 
total execution time is an exponential random variable with mean 1/pa. 


THE SAMPLE MEAN AND THE LAWS OF LARGE NUMBERS 


Let X be a random variable for which the mean, E[X] = u, is unknown. Let 
X,..., Xn denote n independent, repeated measurements of X; that is, the X7’s are 
independent, identically distributed (iid) random variables with the same pdf as X. The 
sample mean of the sequence is used to estimate EX]: 


M, = 7A. (7.15) 


In this section, we compute the expected value and variance of M, in order to assess 
the effectiveness of M,, as an estimator for ELX]. We also investigate the behavior of 
M, as n becomes large. 

The following example shows that the relative frequency estimator for the prob- 
ability of an event is a special case of a sample mean. Thus the results derived below for 
the sample mean are also applicable to the relative frequency estimator. 


Example 7.8 Relative Frequency 


Consider a sequence of independent repetitions of some random experiment, and let the ran- 
dom variable J; be the indicator function for the occurrence of event A in the jth trial. The total 
number of occurrences of A in the first n trials is then 


N, = 1+ bt-:: + In. 
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The relative frequency of event A in the first n repetitions of the experiment is then 


fa(n) = L tie (7.16) 
= 


Thus the relative frequency f4(n) is simply the sample mean of the random variables J}. 


The sample mean is itself a random variable, so it will exhibit random variation. 
A good estimator should have the following two properties: (1) On the average, it 
should give the correct value of the parameter being estimated, that is, E| M,,] = m; 
and (2) It should not vary too much about the correct value of this parameter, that is, 
E[(M,, — w)*] is small. 

The expected value of the sample mean is given by 


E[M,] = #23, | = eA = u, (7.17) 
j= j= 


since E[X,] = E[X] = u for all j. Thus the sample mean is equal to E[ X] = m, on the 
average. For this reason, we say that the sample mean is an unbiased estimator for u. 

Equation (7.17) implies that the mean square error of the sample mean about u 
is equal to the variance of M,,, that is, 


E[(M, — #)?] = E[(Mn — E[My])’)- 


Note that M,, = S,/n, where S, = Xi + X +: + X,. From Eq. (7.4), VAR[S,] = n 
VAR[X;] = no”, since the X;’s are iid random variables. Thus 


VAR[M,] = + VAR{S)] ==, (7.18) 


Equation (7.18) states that the variance of the sample mean approaches zero as the 
number of samples is increased. This implies that the probability that the sample mean is 
close to the true mean approaches one as n becomes very large. We can formalize this 
statement by using the Chebyshev inequality, Eq. (4.76): 

VAR[M,,] 
P[|M, — E[M,]| = €] = ——,—. 


E€ 


Substituting for E| M,,] and VAR[ M,,], we obtain 


o 


P[|M, - u| = e] = z (7.19) 


If we consider the complement of the event considered in Eq. (7.19), we obtain 


o 


P||M,— u| <6] 2=1- (7.20) 


a 
ne 
Thus for any choice of error ¢ and probability 1 — 6, we can select the number of samples 
nso that M, is within € of the true mean with probability 1 — 6 or greater. The following 
example illustrates this. 
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Example 7.9 


A voltage of constant, but unknown, value is to be measured. Each measurement X; is actually 
the sum of the desired voltage v and a noise voltage N; of zero mean and standard deviation of 
1 microvolt (uV): 


Xj 


=v+ N. 
Assume that the noise voltages are independent random variables. How many measure- 
ments are required so that the probability that M,, is within « = 1 uV of the true mean is at 
least .99? 

Each measurement X; has mean v and variance 1, so from Eq. (7.20) we require that n 
satisfy 


This implies that n = 100. 

Thus if we were to repeat the measurement 100 times and compute the sample mean, on 
the average, at least 99 times out of 100, the resulting sample mean will be within 1 uV of the 
true mean. 


Note that if we let n approach infinity in Eq. (5.20) we obtain 
lim P[|M, — u| < e] = 1. 
no 


Equation (7.20) requires that the X;’s have finite variance. It can be shown that this 
limit holds even if the variance of the X;’s does not exist [Gnedenko, p. 203]. We state 


this more general result: 


Weak Law of Large Numbers Let X,, X>,...be a sequence of iid random 
variables with finite mean E[ X] = pw, then fore > 0, 


lim P[|M, — u| < £] = 1. (7.21) 


The weak law of large numbers states that for a large enough fixed value of 
n, the sample mean using n samples will be close to the true mean with high prob- 
ability. The weak law of large numbers does not address the question about what 
happens to the sample mean as a function of n as we make additional measure- 
ments. This question is taken up by the strong law of large numbers, which we 
discuss next. 

Suppose we make a series of independent measurements of the same random 
variable. Let X1, X2,... be the resulting sequence of iid random variables with mean 
u. Now consider the sequence of sample means that results from the above measure- 
ments: Mı, Mz,..., where M; is the sample mean computed using X, through X;. The 
notion of statistical regularity discussed in Chapter 1 leads us to expect that this se- 
quence of sample means converges to u, that is, we expect that with high probability, 
each particular sequence of sample means approaches u and stays there, as shown in 
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E[X] 


FIGURE 7.1 
Convergence of sequence of sample means to E[X]. 


Fig. 7.1. In terms of probabilities, we expect the following: 


P| lim M, = u] = 1; 
noo 
that is, with virtual certainty, every sequence of sample mean calculations converges to 
the true mean of the quantity. The proof of this result is well beyond the level of this 
course (see [Gnedenko, p. 216]), but we will have the opportunity in later sections to 
apply the result in various situations. 


Strong Law of Large Numbers Let X,, X>,... be a sequence of iid random 
variables with finite mean E[ X] = u and finite variance, then 


Pi lim M, =p] =1. (7.22) 
Equation (7.22) appears similar to Eq. (7.21), but in fact it makes a dramati- 
cally different statement. It states that with probability 1, every sequence of sample 
mean calculations will eventually approach and stay close to E| X] = u. This is the 
type of convergence we expect in physical situations where statistical regularity 
holds. 
With the strong law of large numbers we come full circle in the modeling process. 
We began in Chapter 1 by noting that statistical regularity is observed in many physical 
phenomena, and from this we deduced a number of properties of relative frequency. 
These properties were used to formulate a set of axioms from which we developed a 
mathematical theory of probability. We have now come full circle and shown that, 
under certain conditions, the theory predicts the convergence of sample means to ex- 
pected values. There are still gaps between the mathematical theory and the real world 
(i.e., we can never actually carry out an infinite number of measurements and compute 
an infinite number of sample means). Nevertheless, the strong law of large numbers 
demonstrates the remarkable consistency between the theory and the observed physi- 
cal behavior. 


7.3 
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We already indicated that relative frequencies are special cases of sample aver- 
ages. If we apply the weak law of large numbers to the relative frequency of an event 
A, fa(n), in a sequence of independent repetitions of a random experiment, we obtain 


lim P[|fa(m) — P[A]] < e] = 1. (7.23) 
If we apply the strong law of large numbers, we obtain 


P[ lim fa(n) = P[A]] = 1. (7.24) 


Example 7.10 


In order to estimate the probability of an event A, a sequence of Bernoulli trials is carried out 
and the relative frequency of A is observed. How large should n be in order to have a .95 proba- 
bility that the relative frequency is within 0.01 of p = P[A]? 

Let X = I, be the indicator function of A. From Table 3.1 we have that the mean of J, is 
u = p and the variance is 0? = p(1 — p). Since p is unknown, g? is also unknown. However, it 
is easy to show that p(1 — p) is at most 1/4 for 0 = p = 1. Therefore, by Eq. (7.19), 


Pllfa(2) — p| = e] = 


ne 4ne 


The desired accuracy is e = 0.01 and the desired probability is 


1-.95 = : 
Ane? 
We then solve for n and obtain n = 50,000. It has already been pointed out that the Chebyshev 
inequality gives very loose bounds, so we expect that this value for is probably overly conser- 
vative. In the next section, we present a better estimate for the required value of n. 


THE CENTRAL LIMIT THEOREM 


Let X,, X2,...be a sequence of iid random variables with finite mean u and finite 
variance o”, and let S„ be the sum of the first n random variables in the sequence: 


In Section 7.1, we developed methods for determining the exact pdf of S,,. We now pre- 
sent the central limit theorem, which states that, as n becomes large, the cdf of a prop- 
erly normalized S„ approaches that of a Gaussian random variable. This enables us to 
approximate the cdf of S„ with that of a Gaussian random variable. 

The central limit theorem explains why the Gaussian random variable appears in 
so many diverse applications. In nature, many macroscopic phenomena result from the 
addition of numerous independent, microscopic processes; this gives rise to the Gauss- 
ian random variable. In many man-made problems, we are interested in averages that 
often consist of the sum of independent random variables. This again gives rise to the 
Gaussian random variable. 

From Example 7.2, we know that if the X;’s are iid, then S„ has mean nw and 
variance no’. The central limit theorem states that the cdf of a suitably normalized 


version of S„ approaches that of a Gaussian random variable. 
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(a) The cdf of the sum of five independent Bernoulli random variables with p = 1/2 and the cdf of a Gaussian random variable 
of the same mean and variance. (b) The cdf of the sum of 25 independent Bernoulli random variables with p = 1/2 and the cdf 
of a Gaussian random variable of the same mean and variance. 


Central Limit Theorem Let S,, be the sum of n iid random variables with fi- 
nite mean E[ X] = wand finite variance o”, and let Z, be the zero-mean, unit- 
variance random variable defined by 


Sa — np 
Z, = 7.26 
n ovn > ( a) 
then 
lim P[Z, = z] : l hnd (7.26b) 
=z]=—= | e Xx. : 
MEZORA ” V 227 J—œ 
Note that Z, is sometimes written in terms of the sample mean: 
M = 
Z, = Vn E, (1.27) 


oO 


The amazing part about the central limit theorem is that the summands X; can 
have any distribution as long as they have a finite mean and finite variance. This gives 
the result its wide applicability. 

Figures 7.2 through 7.4 compare the exact cdf and the Gaussian approximation 
for the sums of Bernoulli, uniform, and exponential random variables, respectively. In 
all three cases, it can be seen that the approximation improves as the number of terms 
in the sum increases. The proof of the central limit theorem is discussed in the last part 
of this section. 


Example 7.11 


Suppose that orders at a restaurant are iid random variables with mean = $8 and standard 
deviation o = $2. Estimate the probability that the first 100 customers spend a total of more 
than $840. Estimate the probability that the first 100 customers spend a total of between $780 
and $820. 
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FIGURE 7.3 
The cdf of the sum of five independent discrete, uniform random variables from 
the set {0, 1,..., 9} and the cdf of a Gaussian random variable of the same 
mean and variance . 
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(a) The cdf of the sum of five independent exponential random variables of mean 1 and the cdf of a Gaussian random variable 
of the same mean and variance. (b) The cdf of the sum of 50 independent exponential random variables of mean 1 and the 
cdf of a Gaussian random variable of the same mean and variance. 


Let X, denote the expenditure of the kth customer, then the total spent by the first 100 
customers is 


Sio = X41 t X> poes X100- 


The mean of Sio is nu = 800 and the variance is no” = 400. Figure 7.5 shows the pdf of S409 
where it can be seen that the pdf is highly concentrated about the mean. The normalized form of 
S 100 is 


Sioo — 800 


Z = 
100 20 
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FIGURE 7.5 
Gaussian pdf approximations S99 and S439 in Examples 7.11 and 7.12. 


Thus 


840 — 800 
20 


P[ Sio > 840] = P| Zio > 
= Q(2) = 2.28(107), 
where we used Table 4.2 to evaluate Q(2). Similarly, 
P[780 = Siwo = 820] = P[-1 = Zi = 1] 
= 1- 2Q(1) 
= .682. 


Example 7.12 


In Example 7.11, after how many orders can we be 90% sure that the total spent by all customers 
is more than $1000? 
The problem here is to find the value of n for which 


P[S,, > 1000] = .90. 


Sn has mean 8n and variance 4n. Proceeding as in the previous example, we have 


1000 — 8n 
P[S,, > 1000] = P| Z, > ————— | = .90. 
2Vn l 


Using the fact that Q(—x) = 1 — Q(x), Table 4.3 implies that n must satisfy 


1000 — 8n 


= —1.2815, 
2Vn 


7.3.1 
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which yields the following quadratic equation for Vn: 
8n — 1.2815(2) Vn — 1000 = 0. 


The positive root of the equation yields Vn = 11.34, orn = 128.6. Figure 7.5 shows the pdf for S129. 


Example 7.13 


The time between events in a certain random experiment is iid exponential random variables with 
mean m seconds. Find the probability that the 1000th event occurs in the time interval (1000 + 50)m 

Let X; be the time between events and let $, be the time of the nth event, then S, is given 
by Eq. (7.25). From Table 4.1, the mean and variance of X; is given by E[X;] = mand VAR[X;] = 
m°. The mean and variance of S,, are then E[S,] = nE[X;] = nmand VAR(S,] = n VARLX;] = 
nm’. The central limit theorem then gives 


50m — 1000 1050m — 1000 
P[950m = Siow = 1050m] = A? il M Ae aeoo | 
m\/1000 m\1000 
= Q(1.58) — Q(-1.58) 
= 1 — 20(1.58) 


= 1 — 2(0.0567) = .8866. 


Thus as n becomes large, S, is very likely to be close to its mean nm. We can therefore conjecture 
that the long-term average rate at which events occur is 


t 1 
pea eats events/second. (7.28) 
S,seconds nm m 


The calculation of event occurrence rates and related averages is discussed in Section 7.5. 


Gaussian Approximation for Binomial Probabilities 


We found in Chapter 2 that the binomial random variable becomes difficult to compute 
directly for large n because of the need to calculate factorial terms. A particularly im- 
portant application of the central limit theorem is in the approximation of binomial 
probabilities. Since the binomial random variable is a sum of iid Bernoulli random vari- 
ables (which have finite mean and variance), its cdf approaches that of a Gaussian ran- 
dom variable. Let X be a binomial random variable with mean np and variance 
np(1 — p), and let Y be a Gaussian random variable with the same mean and variance, 
then by the central limit theorem for n large the probability that X = k is approxi- 
mately equal to the integral of the Gaussian pdf in an interval of unit length about k, as 
shown in Fig. 7.6: 


1 1 

ee a aa 
2 2 

k+1/2 


Sanne (1 — p) fe 1/2 


(x—np)?/2np(1—p) dx. (7.29) 
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FIGURE 7.6 


(a) Gaussian approximation for binomial probabilities with n = 5 and p = 1/2. 
(b) Gaussian approximation for binomial with n = 25 and p = 1/2. 


The above approximation can be simplified by approximating the integral by the prod- 
uct of the integrand at the center of the interval of integration (that is, x = k) and the 
length of the interval of integration (one): 


P[X =k] ~ ! ge (e-npyean(l—p) (7.30) 


V2anp(1 — p) 


7.3.2 
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Figures 7.6(a) and 7.6(b) compare the binomial probabilities and the Gaussian approx- 
imation using Eq. (7.30). 


Example 7.14 


In Example 7.10 in Section 7.2, we used the Chebyshev inequality to estimate the number of 
samples required for there to be a .95 probability that the relative frequency estimate for the 
probability of an event A would be within 0.01 of P[A]. We now estimate the required number of 
samples using the Gaussian approximation for the binomial distribution. 

Let fa(n) be the relative frequency of A in n Bernoulli trials. Since f4(n) has mean p and 
variance p(1 — p)/n, then 


Z = faln) N 
"Vp = p)in 


has zero mean and unit variance, and is approximately Gaussian for n sufficiently large. The 
probability of interest is 


evn evn 
Pl |fa(n) — TEE ja 20( ) 
uA a i V p(1 — p) V p(l — p) 


The above probability cannot be computed because p is unknown. However, it can be easily shown 


that p(1 — p) = 1/4 for p in the unit interval. It then follows that for such p, V p(1 — p) = 1/2, 
and since Q(x) decreases with increasing argument 


Pl|fa(m) = p| < £] > 1 = 20(2eVn). 


We want the above probability to equal .95. This implies that Q(2e Vn) = (1 — .95)/2 = .025. 
From Table 4.2, we see that the argument of Q(x) should be approximately 1.95, thus 


2eVn = 1.95. 


Solving for n, we obtain 


n = (.98)?°/e? = 9506. 


Chernoff Bound for Binomial Random Variable 


The Gaussian pdf extends over the entire real line. When taking the sum of random 
variables that have a finite range, such as the binomial random variable, the central 
limit theorem can be inaccurate at the extreme values of the sum. The Chernoff bound 
introduced in Chapter 3 gives better estimates. 

The Chernoff bound for the binomial is given by: 


PX > a] < e “El es*] = e “El (e)*] = e “Gy le") = e“(q E pes)” 


where s > 0, and Gy (z) is the pgf for the binomial random variable. To minimize the 
bound we take the derivative with respect to s and set it to zero: 
d , maes re 
0= ase ONle’) = —ae™ (q + pe’)" + e“e'np(q + pe’)! 
s 
a(q + pe’) = e'np 
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where the second line results after canceling common terms. The optimum s and the 
associated bound are: 
e = ae 
p(n — a) 
p(n ~ a)y" aq" _ (p(n—a)\"(_ an Y 
P[X =a] = ( ) q+p 2 = ( 1 
aq p(n — a) aq (n — a) 
p(1 5 aln) a q n poe n 
g ( (a/n)q ) G — =) 7 (aln)*(1 = aln)!" l 
Example 7.15 
Compare the central limit estimate for P[ X > x] with the Chernoff bound for the binomial 
random variable with n = 100 and p = 0.5. 
The central limit gives the estimate: 
xX — np = 2) 
P| X =a] 7% = ( 
xean oa) s 
The Chernoff bound is: 
1/2 a 
Pix = a) =( z 1 z) : 
(x/100)™(1 — x/100)°"~ 
Figure 7.7 shows a comparison of the exact values of the tail distribution with the Chernoff 
bound and the estimate from the central limit theorem. The central limit theorem estimate is 
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Comparison of Chernoff bound and central limit theorem. 
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more accurate than the Chernoff bounds up to about x = 86. At the extreme values of x, the 
Chernoff bound remains accurate while the central limit estimate loses its accuracy. 


*7,3.3 Proof of the Central Limit Theorem 


We now sketch a proof of the central limit theorem. First note that 


1 n 
ovn ea a u). 


The characteristic function of Z, is given by 


= [] Ejen] 
k=1 


Snes wien (7.31) 


The third equality follows from the independence of the X;’s and the last equality fol- 
lows from the fact that the X;’s are identically distributed. 

By expanding the exponential in the expression, we obtain an expression in terms 
of n and the central moments of X: 


E| ell X-wiovn) 


$ . 2 
> oÈ EGES Sih ET Or (x ee re) | 


ovn 2! no 
= 1 + px ~ wy) + alex - wy] + ELRO] 


Noting that E[ (X — w)] = 0 and E|(X 7 u)? = g°, we have 


2 


E| eX -uyoVn] a 7 + E[R(o)]. (7.32) 


The term E[ R(w) | can be neglected relative to w’/2n as n becomes large. If we sub- 
stitute Eq. (7.32) into Eq. (7.31), we obtain 
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The latter expression is the characteristic function of a zero-mean, unit-variance Gauss- 
ian random variable. Thus the cdf of Z,, approaches the cdf of a zero-mean, unit-variance 
Gaussian random variable. 


CONVERGENCE OF SEQUENCES OF RANDOM VARIABLES 


In Section 7.2 we discussed the convergence of the sequence of arithmetic averages M,, 
of iid random variables to the expected value u: 


M, >u asn— œ. (7.33) 


The weak law and strong law of large numbers describe two ways in which the sequence 
of random variables M,, converges to the constant value given by u. In this section we 
consider the more general situation where a sequence of random variables (usually not 
iid) X,, X2,... converges to some random variable X: 


X, >X asn>oo, (7.34) 


We will describe several ways in which this convergence can take place. Note that 
Eq. (7.33) is a special case of Eq. (7.34) where the limiting random variable X is given 
by the constant u. 

To understand the meaning of Eq. (7.34), we first need to revisit the defini- 
tion of a vector random variable X = (X1, X2,..., X,,). X was defined as a func- 
tion that assigns a vector of real values to each outcome ¢ from some sample 
space S: 


X(¢) = (X6), X6); -- -> Xa($)). 


The randomness in the vector random variable was induced by the randomness in the un- 
derlying probability law governing the selection of ¢. We obtain a sequence of random 
variables by letting n increase without bound, that is, a sequence of random variables X is 
a function that assigns a countably infinite number of real values to each outcome ¢ from 
some sample space S:' 


From now on, we will use the notation {X,,(Z) } or {X,„} instead of X(¢) to denote the 
sequence of random variables. 

Equation (7.35) shows that a sequence of random variables can be viewed as a se- 
quence of functions of ¢. On the other hand, it is more natural to instead imagine that 
each point in S, say £, produces a particular sequence of real numbers, 


X1, X2, X33... (7.36) 


where x, = Xı(¢), x2 = X2(¢), and so on. The sequence in Eq. (7.36) is called the 
sample sequence for the point ¢. 


1In Chapter 8, we will see that this is also the definition of a discrete-time stochastic process. 
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Example 7.16 


Let ¢ be selected at random from the interval S = [0, 1], where we assume that the probability 
that ¢ is in a subinterval of S is equal to the length of the subinterval. For n = 1,2,... we define 
the sequence of random variables 


1 
V, =¢(1-—]. 
wl 
The two ways of looking at sequences of random variables is evident here. First, we can view V,(¢) 
as a sequence of functions of ¢, as shown in Fig. 7.8(a). Alternatively, we can imagine that we first 


perform the random experiment that yields ¢, and that we then observe the corresponding 
sequence of real numbers V,,(Z), as shown in Fig. 7.8(b). 
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Sequence of random variables as a sequence of functions of ¢ 


(a) 
4 
VO 1H 
¢ re ete ee ee er ere eee rer EEE EE ER E ee AE EE EERE 
3 4 
iy a ca 
3 
ES 
2 
0 
© > 
1 2 3 + 5 
Sequence of random variables as a sequence of 
real numbers determined by ¢ 
(b) 
FIGURE 7.8 


Two ways of looking at sequences of random variables. 
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The standard methods from calculus can be used to determine the convergence 
of the sample sequence for each point ¢. Intuitively, we say that the sequence of real 
numbers x,, converges to the real number x if the difference |x, — x| approaches zero 
as n approaches infinity. More formally, we say that: 


The sequence x, converges to x if, given any £ > 0, we can specify an integer N such that 
for all values of n beyond N we can guarantee that |x, — x| < e. 


Thus if a sequence converges, then for any s we can find an N so that the sequence re- 
mains inside a 2e corridor about x, as shown in Fig. 7.9(a). 


| 
N 


Convergence of a sequence of numbers 


(a) 


+n 


Almost-sure convergence 


(b) 


l > n 


Convergence in probability 


(c) 


FIGURE 7.9 
Sample sequences and convergence types. 


Section 7.4 Convergence of Sequences of Random Variables 381 


If we make e smaller, N becomes larger. Hence we arrive at our intuitive view 
that x,, becomes closer and closer to x. If the limiting value x is not known, we can still 
determine whether a sequence converges by applying the Cauchy criterion: 


The sequence x,, converges if and only if, given e > 0, we can specify integer N’ such that 
for m and n greater than N’, |x, — Xml < €. 


The Cauchy criterion states that the maximum variation in the sequence for points be- 
yond N’ is less than e. 


Example 7.17 


Let V,(¢) be the sequence of random variables from Example 7.16. Does the sequence of real 
numbers corresponding to a fixed ¢ converge? 

From Fig. 7.8(a), we expect that for a fixed value ¢, V,(Z) will converge to the limit ¢. 
Therefore, we consider the difference between the nth number in the sequence and the limit: 


4 


n 


1 
< > 
n 


Ivald) a= |e +) el 


where the last inequality follows from the fact that ¢ is always less than one. In order to keep the 
above difference less than £, we choose n so that 


ZORNE: 


that is, we select n > N = 1/e. Thus the sequence of real numbers V,(¢) converges to ¢. 


When we talk about the convergence of sequences of random variables, we are 
concerned with questions such as: Do all (or almost all) sample sequences converge, 
and if so, do they all converge to the same values or to different values? The first two 
definitions of convergence address these questions. 


Sure Convergence: The sequence of random variables {X,,(¢)} converges 
surely to the random variable X(¢) if the sequence of functions X,,(£) con- 
verges to the function X(¢) as n —> œ for all ¢ in S: 


XO >XE) as n —> œO for all ¢e S. 


Sure convergence requires that the sample sequence corresponding to every ¢ con- 
verges. Note that it does not require that all the sample sequences converge to the 
same values; that is, the sample sequences for different points ¢ and Z’ can converge to 
different values. 


Almost-Sure Convergence: The sequence of random variables { X,,(¢) } con- 
verges almost surely to the random variable X (¢) if the sequence of functions 
X,(¢) converges to the function X(¢) as n > © for all ¢ in S, except possibly 
on a set of probability zero; that is, 


PIXO > X(f) asn> 00] = 1. (7.37) 
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In Fig. 7.9(b) we illustrate almost-sure convergence for the case where sample se- 
quences converge to the same value x; we see that almost all sequences must eventual- 
ly enter and remain inside a 2e corridor. In almost-sure convergence some of the 
sample sequences may not converge, but these must all belong to ¢’s that are in a set 
that has probability zero. 

The strong law of large numbers is an example of almost-sure convergence. Note 
that sure convergence implies almost-sure convergence. 


Example 7.18 


Let ¢ be selected at random from the interval $ = [0, 1], where we assume that the probability 
that ć is in a subinterval of S is equal to the length of the subinterval. For n = 1,2,... we define 
the following five sequences of random variables: 


ue) =É 
vo = (1-4) 
WO = fe" 


Y,(2) = cos 2mné 
Ze) = me, 


Which of these sequences converge surely? almost surely? Identify the limiting random variable. 
The sequence U,,(¢) converges to 0 for all ¢, and hence surely: 


U,(f) = U(¢) = 0 as n —> œ forall ¢ e S. 


Note that in this case all sample sequences converge to the same value, namely zero. 
The sequence V,(¢) converges to ¢ for all ¢, and hence surely: 


vV) >V()=¢ asn>œ  forallZe S. 


In this case all sample sequences converge to different values, and the limiting random variable 
V(¢) is a uniform random variable on the unit interval. 

The sequence W,(¢) converges to 0 for ¢ = 0, but diverges to infinity for all other values 
of ¢. Thus this sequence of random variables does not converge. 

The sequence Y,(¢) converges to 1 for ¢ = 0 and ¢ = 1, but oscillates between —1 and 1 
for all other values of ¢. Thus this sequence of random variables does not converge. 

The sequence Z,,(¢) is an interesting case. For ¢ = 0, we have 


Z(0) = e” > œ as n —> oo, 


On the other hand, for ¢ > 0 and for values of n > 1/Z, the sequence Z,,(£) decreases exponen- 
tially to zero, thus: 


Z,(£) 20 for all £ > 0. 


But P[¢ > 0] = 1, thus Z,,(¢) converges to zero almost surely. However, Z„(¢) does not converge 
surely to zero. 
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The dependence of the sequence of random variables on ¢ is not always evident, 
as shown by the following examples. 


Example 7.19 iid Bernoulli Random Variables 


Let the sequence of random variables X,,(¢) consist of independent equiprobable Bernoulli random 
variables, that is, 


Does this sequence of random variables converge? 

This sequence of random variables will generate sample sequences consisting of all 
possible sequences of 0’s and 1’s. In order for a sample sequence to converge, it must eventually 
stay equal to zero (or one) for all remaining values of n. However, the probability of obtaining all 
zeros (or all ones) in an infinite number of Bernoulli trials is zero. Hence the sample sequences 
that converge have zero probability, and therefore this sequence of random variables does not 
converge. 


Example 7.20 


An urn contains 2 black balls and 2 white balls. At time n a ball is selected at random from the 
urn, and the color is noted. If the number of balls of this color is greater than the number of balls 
of the other color, then the ball is put back in the urn; otherwise, the ball is left out. Let X,,(Z) be 
the number of black balls in the urn after the nth draw. Does this sequence of random variables 
converge? 

The first draw is the critical draw. Suppose the first draw is black, then the black ball that 
is selected will be left out. Thereafter, each time a white ball is selected it will be put back in, and 
when the remaining black ball is selected it will be left out. Thus with probability one, the black 
ball will eventually be selected, and X,,(¢) will converge to zero. On the other hand, if a white 
ball is selected in the first draw, then eventually the remaining white ball will be removed, and 
hence with probability one X„(¢) will converge to 2. Thus X,,(Z) is equally likely to eventually 
converge to 0 or 2, that is, 


X,() > X(€)  asn—co almost surely, 


where 


In order to determine whether a sequence of random variables converges 
almost surely, we need to know the probability law that governs the selection of ¢ and 
the relation between ¢ and the sequence (as in Example 7.16), or the sequence must 
be sufficiently simple that we can determine the convergence directly (as in Examples 
7.19 and 7.20). In general it is easier to deal with other, “weaker” types of convergence 
that are much easier to verify. For example, we may require that at particular time no, 
most sample sequences X,,, be close to X in the sense that E[(X,,, — X )?] is small. 
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This requirement focuses on a particular time instant and, unlike almost-sure con- 
vergence, it does not address the behavior of entire sample sequences. It leads to the 
following type of convergence: 


Mean Square Convergence: The sequence of random variables {X,,(¢)} 
converges in the mean square sense to the random variable X (¢) if 


ELX) - X> asn> o. (7.38a) 


We denote mean square convergence by (limit in the mean) 
lim. X,(¢) = X(¢) asn— Oo, (7.38b) 


Mean square convergence is of great practical interest in electrical engineering appli- 
cations because of its analytical simplicity and because of the interpretation of 
E[(X, — X)?] as the “power” in an error signal. 

The Cauchy criterion can be used to ascertain convergence in the mean square 
sense when the limiting random variable X is not known: 


Cauchy Criterion: The sequence of random variables {X,,(¢)} converges in 
the mean square sense if and only if 


E[(X,(£) — Xn(£))?] 70 = asn—> co and m> ov, (7.39) 


Example 7.21 


Does the sequence V,,(¢) in Example 7.18 converge in the mean square sense? 
In Example 7.18, we found that V,,(¢) converges surely to ¢. We therefore consider 


EE) ~ £9] = (EF -[(HSa-4 


where we have used the fact that ¢ is uniformly distributed in the interval [0, 1]. As n approaches in- 
finity, the mean square error approaches zero, and so we have convergence in the mean square 
sense. 


Mean square convergence occurs if the second moment of the error X, — X 
approaches zero as n approaches infinity. This implies that as n increases, an increasing 
proportion of sample sequences are close to X; however, it does not imply that all such 
sequences remain close to X as in the case of almost-sure convergence. This difference 
will become apparent with the next type of convergence: 


Convergence in Probability: The sequence of random variables {X,,(¢) } 
converges in probability to the random variable X (Z) if, for any e > 0, 


PUXE- X(Q)| > e] 0 asno. (7.40) 


In Fig. 7.9(c) we illustrate convergence in probability for the case where the limiting ran- 
dom variable is a constant x; we see that at the specified time ng most sample sequences 
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FIGURE 7.10 
Relations between different types of convergence and classification of 
sequences introduced in the examples. 


must be within € of x. However, the sequences are not required to remain inside a 2e corridor. 
The weak law of large numbers is an example of convergence in probability. Thus we see 
that the fundamental difference between almost-sure convergence and convergence in 
probability is the same as that between the strong law and the weak law of large numbers. 

We now show that mean square convergence implies convergence in probability. 
The Markov inequality (Eq. (4.75)) applied to (X,, — X)? implies 


E[(X, - X)’] 


g2 


PIX,- X| > e] = P(X, - X}? > è] = 


If the sequence converges in the mean square sense, then the right-hand side 
approaches zero as n approaches infinity. It then follows that the sequence also con- 
verges in probability. Figure 7.10 shows a Venn diagram that indicates that mean square 
convergence implies convergence in probability. The diagram shows that all sequences 
that converge in the mean square sense (designated by the set ms) are contained inside 
the set p of all sequences that converge in probability. The diagram also shows some of 
the sequences introduced in the examples. 

It can be shown that almost-sure convergence implies convergence in probability. 
However, almost-sure convergence does not always imply mean square convergence, 
as demonstrated by the following example. 


Example 7.22 


Does the sequence Z,(¢) in Example 7.18 converge in the mean square sense? 
In Example 7.18, we found that Z,,(¢) converges to 0 almost surely, so we consider 


E[(Z,(¢) — 0)°] = Elem] 


1 Ss e?” 5 
e f ert dé i (1 = en ). 
0 


2n? 


ll 
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As n approaches infinity, the rightmost term approaches infinity. Therefore this sequence does 
not converge in the mean square sense even though it converges almost surely. 


The following example shows that mean square convergence does not imply almost- 
sure convergence. 


Example 7.23 


Let R,(¢) be the error introduced by a communication channel in the nth transmission. Suppose 
that the channel introduces errors in the following way: In the first transmission the channel in- 
troduces an error; in the next two transmissions the channel randomly selects one transmission 
to introduce an error, and it allows the other transmission to be error-free; in the next three 
transmissions, the channel randomly selects one transmission to introduce an error, and it allows 
the other transmissions to be error-free; and so on. Suppose that when errors are introduced, 
they are uniformly distributed in the interval [1, 2]. Does the sequence of transmission errors 
converge, and if so, in what sense? 

Figure 7.11 shows the manner in which the channel introduces errors. The errors become 
sparser as time progresses, so we expect that the sequence is approaching zero in the mean 
square sense. The probability of error p, in the nth transmission is 1/m for n in the interval from 
1+2+:-+(m-—1)= (m -— 1)m/2tol1 +2 +-::+ m= m(m + 1)/2. If we let Y be a uni- 
form random variable in the interval [1,2], then the mean square error at time n is 


E[(X,(£) — 0)°] = E[X}] = E[Y’]p, + 0(1 — pp) = (Z) l 


(m — 1)m m(m + 1) 
for ————— <n = — —. 
2 2 
Thus as n (and m) increases, the mean square error approaches zero and the sequence R,, con- 
verges to zero in the mean square sense. 


0 1 2 3 4 5 6 7+**(m—1)m m(m + 1) 


uJ oo v C 2 7 p 
1 error | error 1 error ~- 
1 error 
FIGURE 7.11 


R, converges in mean square sense but not almost surely. 


*7.5 
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In order for the sequence R, to converge to 0 almost surely, almost all sample sequences 
must eventually become and remain close to zero. However, the manner in which errors are in- 
troduced guarantees that regardless of how large n becomes, a value in the range [1, 2] is certain 
to occur some time later. Thus none of the sample sequences converges to zero, and the sequence 
of random variables does not converge almost surely. 


The last type of convergence we will discuss addresses the convergence of the 
cumulative distribution functions of a sequence of random variables, rather than the 
random variables themselves. 


Convergence in Distribution: The sequence of random variables {X,„} with 
cumulative distribution functions { F,,(x)} converges in distribution to the ran- 
dom variable X with cumulative distribution F(x) if 


F(x) > F(x) as n —> œO (7.41) 
for all x at which F(x) is continuous. 


The central limit theorem is an example of convergence in distribution. To see that 
convergence in distribution does not make any statement regarding the convergence 
of the random variables in a sequence, consider the Bernoulli iid sequence in 
Example 7.19. These random variables do not converge in any of the previous conver- 
gence modes. However, they trivially converge in distribution since they have the 
same distribution for all n. All of the previous forms of convergence imply conver- 
gence in distribution as indicated in Fig. 7.10. 


LONG-TERM ARRIVAL RATES AND ASSOCIATED AVERAGES 


In many problems events of interest occur at random times, and we are interested in 
the long-term average rate at which the events occur. For example, suppose that a 
new electronic component is installed at time ¢ = 0 and that it fails at time X4; an 
identical new component is installed immediately, and it fails after X, seconds, and so 
on. Let M(t) be the number of components that have failed by time t. N(f) is called a 
renewal counting process. In this section, we are interested in the behavior of N(t)/t 
as t becomes very large. 

Let X; denote the lifetime of the jth component, then the time when the nth com- 
ponent fails is given by 


S,=X, +X too +X, (7.42) 


where we assume that the X, are iid nonnegative random variables with 0 < E[X] = 
E| X,] < œ. We say that S, is the time of the nth arrival or renewal, and we call the 
X;’s the interarrival or cycle times. Figure 7.12 shows a realization of M(t) and the asso- 
ciated sequence of interarrival times. The lines in the time axis indicate the arrival 
times. Note that N(t) is a nondecreasing, integer-valued staircase function of time that 
increases without bound as t approaches infinity. 

Since the mean interarrival time is ELX] seconds per event, we expect intuitively 
that N(t) grows at a rate of 1/E[X] events per second. We will now use the strong law of 
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FIGURE 7.12 
A counting process and its interarrival times. 


large numbers to show this is the case. The average arrival rate in the first t seconds is 
given by M(t)/t. We will show that with probability one, N(t)/t > 1/E[ X ] as t > œ. 

Since N(¢) is the number of arrivals up to time f, then Sy;;) is the time of the 
last arrival prior to time ¢, and Sy,;) +1 is the time of the first arrival after time t (see 
Fig. 7.13). Therefore 


Swi) = t < Swuysi- 
If we divide the above equation by M(t), we obtain 


SNe) POE a Swit) +1 
N(t)  N(t) N(t) ` 


(7.43) 


Nt) + 1 


Nt) 


l » 


Syo t Sno 


FIGURE 7.13 
Time of first arrival after time t and first arrival before 
time t. 
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The term on the left-hand side is the sample average interarrival time for the first N(£) 
arrivals: 


A) 1 N(t) 
ay 
NO NGHA 


As t —> œ, N(t) approaches infinity so the above sample average converges to E[X], 
with probability one, by the strong law of large numbers. We now show that the term on 
the right-hand side also approaches E[X]: 


Snes f SNo (= + ‘) 

N(t) N(t) +1 N(t) J 
As t— œ, the first term on the right-hand side approaches ELX] and the second 
term approaches 1 with probability one. Thus the lower and upper terms in Eq. (7.34) 


both approach ELX] with probability one as t approaches infinity. We have proved 
the following theorem: 


Theorem 1 Arrival Rate for iid Interarrivals 


Let M(t) be the counting process associated with the iid interarrival sequence Xj, with 
0 < E[X;] = E[X] < ©. Then with probability one, 


mo (7.44) 


Example 7.24 Exponential Interarrivals 


Customers arrive at a service station with iid exponential interarrival times with mean 
E| X;] = 1/a. Find the long-term average arrival rate. 
From Theorem 1, it immediately follows that with probability one, 


0 NO 1 
lim ——— = — =a. 
tœ% t Q 


Thus a represents the long-term average arrival rate. 


Example 7.25 Repair Cycles 


Let U; be the “up” time during which a system is continuously functioning, and let D; be the 
“down” time required to repair the system when it breaks down. Find the long-term average rate 
at which repairs need to be done. 
Define a repair cycle to consist of an “up” time followed by a “down” time, X; = U; + Dj, 
then the average cycle time is E[U] + E[ D]. The number of repairs required by time t is N(t), 
and by Theorem 1, the rate at which repairs need to be done is 
_ N(t) 1 
lim = : 
t>o f E[U] + E[D] 
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Long-Term Time Averages 


Suppose that events occur at random with iid interevent times X;, and that a cost C; is 
associated with each occurrence of an event. Let C(t) be the cost incurred up to time t. 
We now determine the long-term behavior of C(t)/t, that is, the long-term average rate 
at which costs are incurred. 

We assume that the pairs (Xj, C;) form a sequence of iid random vectors, but 
that X; and C; need not be independent; that is, the cost associated with an event 
may depend on the associated interevent time. The total cost C(t) incurred up to 
time ¢ is then the sum of costs associated with the N(t) events that have occurred up 
to time f: 


NG) 
C(t) = SC). (7.45) 
j=l 
The time average of the cost up to time t is C(t)/t, thus 
C(t 1X0) 
Mii TÈ C; 
t to 
N(t){f 1 XO 
=—)j)—~ > Cj. 7.46 
t <P 4 oP) 


By Theorem 1, as t — œ, the first term on the right-hand side approaches 1/ELX] with 
probability one. The expression inside the brackets is simply the sample mean of the 
first N(¢) costs. As t —> œ, M(t) approaches infinity, so the second term approaches E[C] 
with probability one, by the strong law of large numbers. Thus we have the following 
theorem: 


Theorem 2 Cost Accumulation Rate 


Let (Xj, Cj) be a sequence of iid interevent times and associated costs, with 0 < E[X,] < œ% 
and E[C i] < 00, and let C(f) be the cost incurred up to time t. Then, with probability one, 


m—— =, (7.47) 


The following series of examples demonstrate how Theorem 2 can be used to 
calculate long-term time averages. 


Example 7.26 Long-Term Proportion of "Up" Time 


Find the long-term proportion of time that the system is “up” in Example 7.25. 
Let Iy(t) be equal to one if the system is up at time t and zero otherwise, then the long- 
term proportion of time in which the system is up is 


where the integral is the total time the system is up in the time interval [0, £]. 
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Now define a cycle to consist of a system “up” time followed by a “down” time, then 
X; = U; + D;, and E[X] = E[U] + E[D]. If we let the cost associated with each cycle be the 
“up” time Uj, then if t is an instant when a cycle ends, 


t N(t) 
[ww dt’ = X U; = C(t). 
0 j=l 
Thus C(f)/t is the proportion of time that the system is “up” in the time interval (0, £). By Theo- 
rem 2, the long-term proportion of time that the system is “up” is 
C(t) ElU] 


iy Ue D] 


Example 7.27 


In the previous example, suppose that a cost C; is associated with each repair. Find the long-term 
average rate at which repair costs are incurred. 

The mean interevent time is E[U] + E[D], and the mean cost per repair is E[C]. Thus by 
Theorem 2, the long-term average repair cost rate is 


Example 7.28 A Packet Voice Transmission System 


A packet voice multiplexer can transmit up to M packets every 10-millisecond period. Let N be 
the number of packets input into the multiplexer every 10 ms. If N = M the multiplexer trans- 
mits all N packets, and if N > M the multiplexer transmits M packets and discards (N — M) 
packets. Find the long-term proportion of packets discarded by the multiplexer. 

Define a “cycle” by X; = Nj, that is, the length of the “cycle” is equal to the number of 
packets produced in the jth interval. Define the cost in the jth cycle by Cj = (Nj — M)* = max 
(N; — M, 0), that is, the number of packets that are discarded in the jth cycle. With these defini- 
tions, t represents the first t packets input into the multiplexer and C(t) represents the number 
that had to be discarded. The long-term proportion of packets discarded is then 


C(t) E[(N - MY] 


li = 

iSo f E[N] 
where 

E[(N - M)"] = > (k - M)pr, 
k=m 

where p, is the pmf of N. 
Example 7.29 The Residual Lifetime 
Let X1, X2,... be a sequence of interarrival times, and let the residual lifetime r(t) be defined as 


the time from an arbitrary time instant ¢ until the next arrival as shown in Fig. 7.14. Find the long- 
term proportion of time that r(t) exceeds c seconds. 
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FIGURE 7.14 
Residual lifetime in a cycle. 


The amount of time that the residual lifetime exceeds c in a cycle of length X is (X — c)*, 
that is, X — c when the cycle is longer than c seconds, and 0 when it is shorter than c seconds. 
The long-term proportion of time that r(t) exceeds c seconds is obtained from Theorem 2 by 
defining the cost per cycle by C; = (X; — c)*: 


proportion of time r(t) exceeds c = 


-z7 P[X > x+ c]dx 
0 


= a {1 — Fy(x + c)} dx 


- aH i meko (1.48) 


where Eq. (4.28) was used for E[ (X — c)*] in the second equality. This result is used extensively 
in reliability theory and in queueing theory. 


*7.6 CALCULATING DISTRIBUTIONS USING THE DISCRETE FOURIER TRANSFORM 


In many situations we are forced to obtain the pmf or pdf of a random variable from its 
characteristic function using numerical methods because the inverse transform cannot 
be expressed in closed form. In the most common case, we are interested in finding the 
pmf/pdf corresponding to ® x(w)”, which corresponds to the characteristic function of 
the sum of n iid random variables. In this section we introduce the discrete Fourier trans- 
form, which enables us to perform this numerical calculation in an efficient manner. 


7.6.1 
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Discrete Random Variables 


First, suppose that X is an integer-valued random variable that takes on values in the 
set {0,1,..., N — 1}. The pmf of the sum of n such independent random variables is 
given by the n-fold convolution of the pmf of X, or equivalently by the nth power of the 
characteristic function of X. Therefore we can deal with the sum of n random variables 
through the convolution of pmf’s or through the product of characteristic functions 
and inverse transforms. Let us first consider the convolution approach. 


Example 7.30 


Use Octave to calculate the pmf of Z = U; + U + U; + U4 where the U; are iid uniform discrete 
random variables in the set {0,1,..., 9}. 

Octave and MATLAB provide a function for convolving the elements of two vectors. The 
sequence of commands below produces a 4-fold convolution of the above discrete uniform pdf. 
The first convolution of the pmf with itself yields a pdf with triangular shape. Figure 7.15 shows 
that the 4-fold convolution is beginning to have a bell-shaped form. 

SPS) [jh E e e PE E O EEn 
> P2=conv (P, P); 

> stem (conv (P2,”@11”)) 

> hold on 

> stem (conv (P2, P2) ,”@22”) 


If a large number of sample values is involved in the calculations, then the char- 
acteristic function approach is more efficient. The characteristic function for this integer- 
valued random variable is 


N-1 
Px(w) = X ep, (7.49) 
k=0 
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FIGURE 7.15 
pmf of sum of random variables using convolution method. 
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where py = P[X = k] is the pmf. ®y(w) is a periodic function of w with period 27 
since elile + 2m)k) = elek pjk2a = elok 2 
Consider the characteristic function at N equally spaced values in the interval 


[0, 27): 
2 N-1 : 
Cn = ox) = Y pyelmkmin m=0,1,...,N—-1. (7.50) 


Equation (7.50) defines the discrete Fourier transform (DFT) of the sequence 
Po>--+> Pn-1- (The sign in the exponent in Eq. (7.50) is the opposite of that used in the 
usual definition of the DFT.) In general, the c,,’s are complex numbers. Note that if we 
extend the range of m outside the range {0, N — 1} we obtain a periodic sequence 
consisting of a repetition of the basic sequence cp,..., Cy-1- 

The sequence of p,’s can be obtained from the sequence of c,,,’s using the inverse 
DFT formula: 

1 N-1 


pe S Ss Cpe Pm kmIN k =0,1,..., N- 1. (7.51) 
m=0 


Example 7.31 


A discrete random variable X has pmf 


1 
and Dh = g 


’ 


o | w 


Find the characteristic function of X, the DFT for N = 3, and verify the inverse transform formula. 
The characteristic function of X is given by Eq. (7.49): 
1 3, T 
= — + el 4 e, 
® y() ae 3° 
The DFT for N = 3 is given by the values of the characteristic function at œ = 27m/3, for 
m = 0, 1,2: 


co = Px(0) = 1 
2 O A . 
c= ox =) 2 pital 4 = pidnl3 


4 
c2 od) = 3 sjari + = ei8a/3, 
E JCI)? 
4 4 


>This follows from Euler’s formula e? = cos 0 + sin 0. 
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where we have used Euler’s formula to evaluate the complex exponentials. 
We substitute the c;’s into Eq. (7.51) to recover the pmf: 


1 
Po = 3 (Co + cy + cp) 
a Vl RN e AO 
3 4 4 4 4 
_l 
2: 
1 —j27. —j27 3 
p= AG + ce? + eye 223) = > 
1 : : 1 
2 +7. g no jimh y n i4n2By — + 
P2 3 (Co cye c2e ) 8" 


The range of the integer-valued random variable X can be extended to the larger 
set {0,1,...,N — 1, N,..., L — 1} by defining a new pmf p; given by 


(7.52) 


The characteristic function of the random variable, ® y(w), remains unchanged, but 
the associated DFT now involves evaluating ® x(w) at a different set of points: 


Cm = »,(2) for m= 0,..., L-1. (7.53) 
The inverse transform of the sequence in Eq. (7.53) then yields Eq. (7.52). Thus the 
pmf can be recovered using the DFT on L = N samples of ® x(w) as specified by 
Eq. (7.53). In essence, we have only padded the pmf with L — N zeros in Eq. (7.52). 

The zero-padding method discussed above is required to evaluate the pmf of a 
sum of iid random variables. Suppose that 


Z= Xt Xp te +X, 


where the X; are integer-valued iid random variables with characteristic function 
P y(w). If the X; assume values from {0,1,..., N — 1}, then Z will assume values 
from {0,...,n(N — 1)}. The pmf of Z is found using the DFT evaluated at the 
L=n(N — 1) + 1 points: 


2 2am\" 
dn = ,{ 22") = ox zm) m=0,..., L-1, 


since ®z(w) = ® x(w)”. Note that this requires evaluating the characteristic function 
of X at L > N points. The pmf of Z is then found from 
1 L-1 , 
P[Z =k) =F D dine Pen k = 0,1,..., L-1. (7.54) 
m=0 
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Example 7.32 


Let Z = X, + X7, where the X; are iid random variables with characteristic function: 


Te ago: 
(w) = 3 + ace 


Find P[Z = 1] using the DFT method. 
X assumes values from {0,1} and Z from {0,1,2}, so ®z(w) = ® x(w)? needs to be 
evaluated at three points: 


1 
dy = 1, d = =e and d, = T7 


Substituting these values into Eq. (7.54) with k = 1 gives 


1 : ` 
PZ =1]= ztd + dye PF + dze 7h} 


: fı A e 2a een) } 


We can verify this answer by noting that 


P[Z = 1] = PHX: = OFN{X2 = 1}] + PHX = UN {X = OF] 
12,21 4 
= —— +- = — 
33 33 9 


In practice we are interested in using the DFT when the number of points in the 
pmf is large. An examination of Eq. (7.51) shows that the calculation of all N points re- 
quires approximately N? multiplications of complex numbers. Thus if N = 21° = 1024, 
approximately 10° multiplications will be required. The popularity of the DFT method 
stems from the fact that algorithms, called fast Fourier transform (FFT) algorithms, 
have been developed that can carry out the above calculations in N log) N multiplica- 
tions. For N = 2", 10* multiplications will be required, a reduction by a factor of 100. 


Example 7.33 


Use Octave to calculate the pmf of Z = U4 + U2 + ... + Uio where the U; are iid uniform dis- 
crete random variables in the set {0, 1,..., 9}. 
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FIGURE 7.16 


FFT calculation of 10-fold convolution of discrete uniform random variable 
{0,1,..., 9}. 


The commands below show the definition of the discrete uniform pmf and the calculation 
of the FFT. This result is raised to the 10th power and the inverse transform is calculated. Figure 
7.16 shows that the resulting pmf is starting to look very Gaussian in shape. 


> P= [22,2 Ly, D1, 134) 710; 
>bar (ifft (fft (P, 128) .*;10)) 


So far, we have restricted X to be an integer-valued random variable that takes 
on only a finite set of values Sy = {0,1,..., N — 1}. We now consider the case where 
Sy = {0,1,2,...}. Suppose that we know ®y(w), and that we obtain a pmf p} from 
Eq. (7.51) using a finite set of sample points from ® x(w), Cm = P y(2mm/N) for 
m=0,1,...,N — 1, 


1 N-1 ; 
B= X cme Orr"  k=0,1,...,N — 1. (7.55) 
N =o 
To see what this calculation yields consider the points c,,: 


2mm 7 
G = ej2amnIN 
a N ) =, Pn 


Sees 
+ (pit pysg bo )eP7™N 
+ eae 


+ (py-1 + Pon-1 + -+ i la 


N-1 ; 
— > per. (7.56) 
k=0 
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where we have used the fact that e27””N = e?mm(ntANYIN for h an integer, to obtain 
the second equality and where for k = 0,..., N — 1, 


Pk = Pk t Pyrek + Pon+e to (7.57) 
Equation (7.55) states that the inverse transform of the points cm = ®y(27m/N) will 
yield po,..., Py—1, which are equal to the desired value py plus the error 


ek = Pnek + Ponte t'e 


Since the pmf must decay to zero as k increases, the error term can be made small by 
making N sufficiently large. The following example carries out an evaluation of the 
above error term in a case where the pmf is known. In practice, the pmf is not known so 
the appropriate value of N is found by trial and error. 


Example 7.34 


Suppose that X is a geometric random variable. How large should N be so that the percent error 
is 1%? 
The error term for p; is given by 


ep = D Pern = aa — p)pX**N = (1 — p)p 


The percent error term for px is 


N 
e 
EB ee Ee AEN 10S, 


Pk 1-p 


By solving for N, we find that the error is less than a = 0.01 if 
log(a/1 — a) —2.0 
> = : 
log p logio pP 


Thus for example if p = .1, .5, .9, then the required N is 2, 7, and 44, respectively. These numbers 
show how the required N depends strongly on the rate of decay of the pmf. 


Continuous Random Variables 


Let X be a continuous random variable, and suppose that we are interested in finding 
the pdf of X from ® y(w) using a numerical method. We can take the inverse Fourier 
transform formula and approximate it by a sum over intervals of width wọ: 


CoO 


1 : 
f(x) =5-] Px(wje'* dw 

277 Joo 

1 M-1 


ca, 5 P y(map)e "og, (7.58) 
2T m=—M 


R 


where the sum neglects the integral outside the range [—May, Mao). The above sum 
takes on the form of a DFT if we consider the pdf in the range | —27r/9, 277/wo) with 
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x = nd, d = 27/Nap, and N = 2M: 


M-1 
fx(nd) ~ — S ®xy(ma)eP"™N -Msn=M-1. (7.59) 


T m=—M 


Equation (7.59) is a 2M-point DFT of the sequence 


Cm = SE ® x(a), 

T 
The FFT algorithm requires that n range from 0 to 2M — 1. Equation (7.59) can be 
cast into this form by recalling that the sequence c,, is periodic with period N. An FFT 
algorithm will then calculate Eq. (7.59) if we input the sequence 


Cn = 


Ci 0OsmsM-l1 
Cm—-2M-1 M<ms2M - 1. 


Three types of errors are introduced in approximating the pdf using Eq. (7.59). 
The first error involves approximating the integral by a sum. The second error results 
from neglecting the integral for frequencies outside the range [—Ma , Mao). The third 
error results from neglecting the pdf outside the range [—27r/w9, 27r/wo). The first and 
third errors are reduced by reducing wy. The second error can be decreased by increas- 
ing M while keeping wọ fixed. 


Example 7.35 


The Laplacian random variable with parameter a = 1 has characteristic function 


=> al 

1+ 0 
Figures 7.17(a) and 7.17(b) compare the pdf with the approximation obtained using Eq. (7.59) 
with N = 512 points and two values of wọ. It can be seen that decreasing wọ increases the accu- 
racy of the approximation. 

The Octave code for obtaining the figure is shown below. The first part shows the com- 
mands to generate the characteristic function and call the FFT function f£t_pxs, which calcu- 
lates the pdf. The function fft_pxs accepts a vector of values of the characteristic function from 
—M (negative frequencies) to M—1 (positive frequencies). The function forms a new vector 
where the negative frequency terms are placed in the last M entries. It performs the FFT and 
then shifts the results back. 


P y(w) —co < w < œ. 


(a) Interactive commands 
>N=512 
>M=N/2; 
>w0=1; 
>n=[—M: (M—1)]; 
>phix=1./1.+(w0^2* (n.*n)); % Evaluate the characteristic function. 
>fx=zeros (size(n)); 
>[n1,x1,afx1]=fft_pxs (phix,w0,N); % Find inverse of characteristic function. 
>fxl=laplace_pdf (x1) ; % Calculate exact pdf. 
>plot (n1,afx1) 
>hold on; 
>plot (n1,f£x1) 
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n n 


(a) (b) 


FIGURE 7.17 

(a) Comparison of exact pdf and pdf obtained by numerically inverting the characteristic function of a Laplacian random 
variable. Approximation using wọ = 1 and N = 512. (b) Comparison of exact pdf and pdf obtained by numerically inverting 
the characteristic function of a Laplacian random variable. Approximation using wọ = 1/2 and N = 512. 


(b) Function definition 
function [n,t,rx]=fft_pxs(sx,w0,N) 
% Accepts N=2M samples of frequency spectrum from 
% frequency range —Mw0 to (M—1) w0; 
% Performs periodic extension before 2M-point FFT; 
% Performs FFT shift and returns time function 
% in time range —M d to (M-1)d, where d=2pi/Nw0 
M=N/2; 
n=[—M: (M-1)]; 
d=2*pi/ (N*w0) ; 
t=n.*d; 
sxc=zeros (size(n) ); 
for j=1:M 
sxc (j)=sx(j+M) ; 
sxc (j+M)=sx (j); 
end 
rx=zeros (size(n)); 
rx=fft (sxc); 
rx=rx.*w0./(2.*pi); 
rx=fftshift (rx) ; 
endfunction 


oe 


Positive frequency terms occupy first M entries. 
Move negative frequency terms to last M entries. 


oe 


Calculate the FFT. 


oe 


oe 


Rearrange vector values so negative amplitude 
terms occupy first M entries. 


oe 


SUMMARY 


e The expected value of a sum of random variables is always equal to the sum of 
the expected values of the random variables. In general, the variance of such a 
sum is not equal to the sum of the individual variances. 


e The characteristic function of the sum of independent random variables is equal 
to the product of the characteristic functions of the individual random variables. 
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The sample mean and the relative frequency estimators are used to estimate the 
expected value of random variables and the probabilities of events. The laws of 
large numbers state conditions under which these estimators approach the true 
values of the parameters they estimate as the number of samples becomes large. 
The central limit theorem states that the cdf of a sum of iid finite-mean, finite- 
variance random variables approaches that of a Gaussian random variable. This 
result allows us to approximate the pdf of sums of random variables by that of a 
Gaussian random variable. 

The Chernoff bound provides estimates of the probability of the tails of a 
distribution. 

A sequence of random variables can be viewed as a sequence of functions of ¢, or as 
a family of sample sequences, one sample sequence for each ¢ in S. Sure and almost- 
sure convergence address the question of whether all or almost all sample se- 
quences converge. Mean square convergence and convergence in probability do not 
address the behavior of entire sample sequences but instead address the question of 
whether the sample sequences are “close” to some X at some particular time instant. 
A counting process counts the number of occurrences of an event in a certain time 
interval. When the times between occurrences of events are iid random variables, 
the strong law of large numbers enables us to obtain results concerning the rate at 
which events occur, and results concerning various long-term time averages. 

The discrete Fourier transform and the FFT algorithm allow us to compute numeri- 
cally the pmf and pdf of random variables from their characteristic functions. 


CHECKLIST OF IMPORTANT TERMS 


Almost-sure convergence Relative frequency 

Arrival rate Renewal counting process 
Central limit theorem Sample mean 

Chernoff bound Sample variance 
Convergence in distribution Sequence of random variables 
Convergence in probability Strong law of large numbers 
Discrete Fourier transform Sure convergence 

Fast Fourier transform Weak law of large numbers 


iid random variables 
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See Chung [1, pp. 220-233] for an insightful discussion of the laws of large numbers and 
the central limit theorem. Chapter 6 in Gnedenko [2] gives a detailed discussion of the 
laws of large numbers. Chapter 7 in Ross [3] focuses on counting processes and their 
properties. Cadzow [4] gives a good introduction to the FFT algorithm. Larson and Shu- 
bert [ref 8] and Stark and Woods [ref 9] contain excellent discussions on sequences of 
random variables. 
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Section 7.1: Sums of Random Variables 


7.1. 


7.2. 


7.3. 


7.4. 


7.5. 


7.6. 


Let Z = X + Y + Z, where X, Y, and Z are zero-mean, unit-variance random variables 
with COV(X,Y) = 1/2, and COV(Y, Z) = —1/4 and COV(X, Z) = 1/2. 

(a) Find the mean and variance of Z. 

(b) Repeat part a assuming X, Y, and Z are uncorrelated random variables. 


Let X,,..., X, be random variables with the same mean and with covariance function: 
oe ifi=j 
COV(X;, X) = § po? if li - jl = 1, 
0 otherwise, 
where |p| < 1. Find the mean and variance of S, = X, + «+: + X,. 
Let X,,..., X, be random variables with the same mean and with covariance function 
COV(X;, Xj) = PP, 
where |p| < 1. Find the mean and variance of S, = X, + «+: + X,. 
Let X and Y be independent Cauchy random variables with parameters 1 and 4, respec- 


tively. Let Z = X + Y. 
(a) Find the characteristic function of Z. 
(b) Find the pdf of Z from the characteristic function found in part a. 


Let Sk = X, + --- + Xp, where the X;’s are independent random variables, with X; a 
chi-square random variable with n; degrees of freedom. Show that S; is a chi-square ran- 
dom variable with n = nı + --- + ng degrees of freedom. 


Let S, = X? + --- + XZ, where the X,’s are iid zero-mean, unit-variance Gaussian ran- 
dom variables. 


(a) Show that S, is a chi-square random variable with n degrees of freedom. Hint: See 
Example 4.34. 
(b) Use the methods of Section 4.5 to find the pdf of 


Ty = VXI +-+ X}. 


7.7. 


7.8. 


7.9. 


7.10. 


7.11. 


7.12. 


7.13. 


7.14. 
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(c) Show that T, is a Rayleigh random variable. 


(d) Find the pdf for 7;. The random variable T; is used to model the speed of molecules 
in a gas. T} is said to have the Maxwell distribution. 


Let X and Y be independent exponential random variables with parameters 2 and 10, re- 
spectively. Let Z = X + Y. 


(a) Find the characteristic function of Z. 

(b) Find the pdf of Z from the characteristic function found in part a. 

Let Z = 3X — 7Y, where X and Y are independent random variables. 

(a) Find the characteristic function of Z. 

(b) Find the mean and variance of Z by taking derivatives of the characteristic function 
found in part a. 

Let M,„ be the sample mean of n iid random variables X;. Find the characteristic function 

of M,, in terms of the characteristic function of the X;’s. 

The number X; of raffle winners in classroom j is a binomial random variable with para- 

meter n; and p. Suppose that the school has K classrooms. Find the pmf of the total num- 

ber of raffle winners in the school, assuming the X;’s are independent random variables. 

The number of packet arrivals X; at port i in a router is a Poisson random variable with 

mean a;. Given that the router has k ports, find the pmf for the total number of packet ar- 

rivals at the router. Assume that the X;’s are independent random variables. 


Let X1, X2,... be a sequence of independent integer-valued random variables, let N be 
an integer-valued random variable independent of the X;, and let 
N 
S = XX. 
k=1 


(a) Find the mean and variance of S. 
(b) Show that 


Gs(z) = E(z°) = Gy(Gx(z)), 


where Gy(z) is the generating function of each of the X;,’s. 

Let the number of smashed-up cars arriving at a body shop in a week be a Poisson ran- 
dom variable with mean L. Each job repair costs X; dollars, the X;’s are iid random vari- 
ables that are equally likely to be $500 or $1000. 

(a) Find the mean and variance of the total revenue R arriving in a week. 

(b) Find the Gp(z) = E[z®]. 

Let the number of widgets tested in an assembly line in 1 hour be a binomial random 
variable with parameters n = 600 and p. Suppose that the probability that a widget is 
faulty is a. Let S be the number of widgets that are found faulty in a 1-hour period. 

(a) Find the mean and variance of S. 

(b) Find G,(z) = E[z5]. 


Section 7.2: The Sample Mean and the Laws of Large Numbers 


7.15. 


7.16. 


Suppose that the number of particle emissions by a radioactive mass in ¢ seconds is a Pois- 
son random variable with mean At. Use the Chebyshev inequality to obtain a bound for 
the probability that |N(t)/t — A| exceeds e. 

Suppose that 20% of voters are in favor of certain legislation. A large number n of voters 
are polled and a relative frequency estimate f4(n) for the above proportion is obtained. 
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7.17. 


7.18. 


7.19. 


7.20. 
7.21. 


Sums of Random Variables and Long-Term Averages 


Use Eq. (7.20) to determine how many voters should be polled in order that the proba- 
bility is at least .95 that f4(n) differs from 0.20 by less than 0.02. 

A fair die is tossed 20 times. Use Eq. (7.20) to bound the probability that the total num- 
ber of dots is between 60 and 80. 

Let X; be a sequence of independent zero-mean, unit-variance Gaussian random vari- 
ables. Compare the bound given by Eq. (7.20) with the exact value obtained from the Q 
function for n = 16 and n = 81. 

Does the weak law of large numbers hold for the sample mean if the X;’s have the co- 
variance functions given in Problem 7.2? Assume the X; have the same mean. 


Repeat Problem 7.19 if the X;’s have the covariance functions given in Problem 7.3. 


(The sample variance) Let X,,..., X,, be an iid sequence of random variables for which 
the mean and variance are unknown. The sample variance is defined as follows: 


1 n 
v} = X; — M,’ 
n nies ny 


where M, is the sample mean. 
(a) Show that 


DA ~ u’ = XX; M,’ H n(M,, uy. 
E 
(b) Use the result in part a to show that 


a| Sx, — m| = k(n — 1)’. 


j=1 


(c) Use part b to show that E[V2] = o°. Thus VŽ is an unbiased estimator for the 
variance. 


(d) Find the expected value of the sample variance if n — 1 is replaced by n. Note that 
this is a biased estimator for the variance. 


Section 7.3: The Central Limit Theroem 


7.22. 


7.23. 
7.24. 
7.25. 


7.26. 


7.27. 


(a) A fair coin is tossed 100 times. Estimate the probability that the number of heads is 
between 40 and 60. Estimate the probability that the number is between 50 and 55. 

(b) Repeat part a for n = 1000 and the intervals [400, 600] and [500, 550]. 

Repeat Problem 7.16 using the central limit theorem. 

Use the central limit theorem to estimate the probability in Problem 7.17. 

The lifetime of a cheap light bulb is an exponential random variable with mean 36 

hours. Suppose that 16 light bulbs are tested and their lifetimes measured. Use the cen- 

tral limit theorem to estimate the probability that the sum of the lifetimes is less than 

600 hours. 

A student uses pens whose lifetime is an exponential random variable with mean 1 week. 

Use the central limit theorem to determine the minimum number of pens he should buy 

at the beginning of a 15-week semester, so that with probability .99 he does not run out of 

pens during the semester. 

Let S be the sum of 80 iid Poisson random variables with mean 0.25. Compare the 


exact value of P[S = k] to an approximation given by the central limit theorem as in 
Eq. (7.30). 


7.28. 


7.29. 


7.30. 


7.31. 


7.32. 


7.33. 


7.34. 


7.35. 
7.36. 


7.37. 


7.38. 


7.39. 
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The number of messages arriving at a multiplexer is a Poisson random variable with 
mean 15 messages/second. Use the central limit theorem to estimate the probability that 
more than 950 messages arrive in one minute. 

A binary transmission channel introduces bit errors with probability .15. Estimate the 

probability that there are 20 or fewer errors in 100 bit transmissions. 

The sum of a list of 64 real numbers is to be computed. Suppose that numbers are rounded 

off to the nearest integer so that each number has an error that is uniformly distributed in 

the interval (—0.5, 0.5). Use the central limit theorem to estimate the probability that the 

total error in the sum of the 64 numbers exceeds 4. 

(a) A fair coin is tossed 100 times. Use the Chernoff bound to estimate the probability that 
the number of heads is greater than 90. Compare to an estimate using the central limit 
theorem. 

(b) Repeat part a for n = 1000 and the probability that the number of heads is greater 
than 650. 

A binary transmission channel introduces bit errors with probability .01. Use the Cher- 

noff bound to estimate the probability that there are more than 3 errors in 100 bit 

transmissions. Compare to an estimate using the central limit theorem. 

(a) When you play the rock/paper/scissors game against your sister you lose with prob- 
ability 3/5. Use the Chernoff bound to estimate the probability that you win more 
than half of 20 games played. 

(b) Repeat for 100 games. 

(c) Use trial and error to find the number of games n that need to be played so that the 
probability that your sister wins more than 1/2 the games is 90%. 

Show that the Chernoff bound for X, a Poisson random variable with mean a, is 

P[X =a) = eo *-* tor a > æ. Hint: Use Ele] = e7), 

Redo Problem 7.26 using the Chernoff bound. 

Show that the Chernoff bound for X, a Gaussian random variable with mean u and 

variance o°, is P[X = a] < 07W? a > u. Hint: Use Efe] = eet eo", 

Compare the Chernoff bound for the Gaussian random variable with the estimates 

provided by Eq. (4.54). 

(a) Find the Chernoff bound for the exponential random variable with rate A. 

(b) Compare the exact probability of P[X = k/A] with the Chernoff bound. 

(a) Generalize the approach in Problem 7.38 to find the Chernoff bound for a gamma 
random variable with parameters A and a. 

(b) Use the result of part a to obtain the Chernoff bound for a chi-square random 
variable with k degrees of freedom. 


*Section7.4: Convergence of Sequences of Random Variables 


7.40. 


7.41. 


Let U„(¢), Wa(¢), Y,(Z), and Z,(¢) be the sequences of random variables defined in 
Example 7.18. 


(a) Plot the sequence of functions of ¢ associated with each sequence of random variables. 
(b) For ¢ = 1/4, plot the associated sample sequence. 


Let ¢ be selected at random from the interval S = [0,1], and let the probability that ¢ 
is in a subinterval of S be given by the length of the subinterval. Define the following 
sequences of random variables for n = 1: 


X,(0) = i, Y,(0) = cos? 27, Zal) = cos” 2né. 
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7.42. 


7.43. 


7.44. 


7.45. 


7.46. 


7.47. 


7.48. 


7.49. 
7.50. 


Do the sequences converge, and if so, in what sense and to what limiting random variable? 


Let b;,i = 1, be a sequence of iid, equiprobable Bernoulli random variables, and let ¢ be 
the number between [0, 1] determined by the binary expansion 


f= Xb 
i=1 


(a) Explain why ¢ is uniformly distributed in [0, 1]. 

(b) How would you use this definition of ¢ to generate the sample sequences that occur 
in the urn problem of Example 7.20? 

Let X, be a sequence of iid, equiprobable Bernoulli random variables, and let 


Y, = 2” Xi Xz... Xn. 


(a) Plot a sample sequence. Does this sequence converge almost surely, and if so, to 
what limit? 
(b) Does this sequence converge in the mean square sense? 


Let X, be a sequence of iid random variables with mean m and variance o? < œ. Let M, 
be the associated sequence of arithmetic averages, 


1 n 
Mn = -5 X. 
ni=o 
Show that M, converges to m in the mean square sense. 


Let X, and Y, be two (possibly dependent) sequences of random variables that converge 
in the mean square sense to X and Y, respectively. Does the sequence X,, + Y,, converge 
in the mean square sense, and if so, to what limit? 


Let U, be a sequence of iid zero-mean, unit-variance Gaussian random variables. A “low- 
pass filter” takes the sequence U,, and produces the sequence 


1 
Xn = 3 (Un + Un-1): 


(a) Does this sequence converge in the mean square sense? 

(b) Does it converge in distribution? 

Does the sequence of random variables introduced in Example 7.20 converge in the 
mean square sense? 

Customers arrive at an automated teller machine at discrete instants of time, n = 1, 2,.... 
The number of customer arrivals in a time instant is a Bernoulli random variable with pa- 
rameter p, and the sequence of arrivals is iid. Assume the machine services a customer in 
less than one time unit. Let X, be the total number of customers served by the machine up 
to time n. Suppose that the machine fails at time N, where N is a geometric random variable 
with mean 100, so that the customer count remains at Xy thereafter. 

(a) Sketch a sample sequence for X,. 

(b) Do the sample sequences converge almost surely, and if so, to what limit? 

(c) Do the sample sequences converge in the mean square sense? 

Show that the sequence Y,(¢) defined in Example 7.18 converges in distribution. 

Let X,, be a sequence of Laplacian random variables with parameter a = n. Does this se- 
quence converge in distribution? 
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*Section 7.5: Long-Term Arrival Rates and Associated Averages 


7.51. 


7.52. 


7.53. 


7.54. 


7.55. 


7.56. 


7.57. 


7.58. 


7.59. 


7.60. 


The customer arrival times at a bus depot are iid exponential random variables with 

mean 1 minute. Suppose that buses leave as soon as 30 seats are full. At what rate do 

buses leave the depot? 

A faulty clock ticks forward every minute with probability p = 0.1 and it does not tick 

forward with probability 1 — p. What is the rate at which this clock moves forward? 

(a) Show that {N(t) = n} and {S, = t} are equivalent events. 

(b) Use part a to find P[N(t) = n] when the X; are iid exponential random variables 
with mean 1/a. 

Explain why the following are not equivalent events: 

(a) {N(t) = n} and {S,, = t}. 

(b) {N(t) > n} and {S, < t}. 

A communication channel alternates between periods when it is error free and periods 

during which it introduces errors. Assuming that these periods are independent random 

variables of means m, = 100 hours and m, = 1 minute, respectively, find the long-term 

proportion of time during which the channel is error free. 

A worker works at a rate r; when the boss is around and at a rate r, when the boss is not 

present. Suppose that the sequence of durations of the time periods when the boss is 

present and absent are independent random variables with means mı and my, respec- 

tively. Find the long-term average rate at which the worker works. 

A computer (repairman) continuously cycles through three tasks (machines). Suppose 

that each time the computer services task i, it spends time X; doing so. 

(a) What is the long-term rate at which the computer cycles through the three tasks? 

(b) What is the long-term proportion of time spent by the computer servicing task i? 

(c) Repeat parts a and b if a random time W is required for the computer (repairman) 
to switch (walk) from one task (machine) to another. 

Customers arrive at a phone booth and use the phone for a random time Y, with mean 

3 minutes, if the phone is free. If the phone is not free, the customers leave immediately. 

Suppose that the time between customer arrivals is an exponential random variable 

with mean 10 minutes. 

(a) Find the long-term rate at which customers use the phone. 

(b) Find the long-term proportion of customers that leave without using the phone. 

The lifetime of a certain system component is an exponential random variable with mean 

T = 2 months. Suppose that the component is replaced when it fails or when it reaches 

the age of 3T months. 

(a) Find the long-term rate at which components are replaced. 

(b) Find the long-term rate at which working components are replaced. 

A data compression encoder segments a stream of information bits into patterns as 

shown below. Each pattern is then encoded into the codeword shown below. 


Pattern Codeword Probability 


1 100 1 

01 101 .09 
001 110 .081 
0001 111 .0729 


0000 0 .6521 
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7.61. 


7.62. 


7.63. 


7.64. 


7.65. 


(a) If the information source produces a bit every millisecond, find the rate at which 
codewords are produced. 


(b) Find the long-term ratio of encoded bits to information bits. 


In Example 7.29 evaluate the proportion of time that the residual lifetime r(t) exceeds c 
seconds for the following cases: 


(a) X; iid uniform random variables in the interval [0, 2]. 

(b) X; iid exponential random variables with mean 1. 

(c) X; iid Rayleigh random variables with mean 1. 

(d) Calculate and compare the mean residual time in each of the above three cases. 


Let the age a(t) of a cycle be defined as the time that has elapsed from the last arrival up 
to an arbitrary time instant t. Show that the long-term proportion of time that a(t) ex- 
ceeds c seconds is given by Eq. (7.48). 


Suppose that the cost in each cycle grows at a rate proportional to the age a(t) of the 
cycle, that is, 


(a) Show that C; = X;/2. 

(b) Show that the long-term rate at which the cost grows is E[ X°]/2E[ X]. 

(c) Show that the result in part b is also the long-term time average of a(t), that is, 
t E[ X?] 


1 
lim — | a(t’) dt’ = 
too ft 0 2E[ X | 


(d) Explain why the average residual life is also given by the above expression. 
Calculate the mean age and mean residual life in Problem 7.63 in the following cases: 
(a) X; iid uniform random variables in the interval [0, 2]. 

(b) X; iid exponential random variables with mean 1. 

(c) X; iid Rayleigh random variables with mean 1. 


(The Regenerative Method) Suppose that a queueing system has the property that when a 
customer arrives and finds an empty system, the future behavior of the system is com- 
pletely independent of the past. Define a cycle to consist of the time period between two 
consecutive customer arrivals to an empty system. Let N, be the number of customers 
served during the jth cycle and let T; be the total delay of all customers served during the 


jth cycle. 
(a) Use Theorem 2 to show that the average customer delay is given by E[T]/E[N], that is, 


where Dx is the delay of the kth customer. 


(b) How would you use this result to estimate the average delay in a computer simula- 
tion of a queueing system? 


«Section 7.6: Calculating Distributions Using the Discrete Fourier Transform 


7.66. 


Let the discrete random variable X be uniformly distributed in the set {0, 1, 2}. 
(a) Find the N = 3 DFT for X. 
(b) Use the inverse DFT to recover P[X = 1]. 


7.67. 


7.68. 


7.69. 


7.70. 


7.71. 


7.72. 


7.73. 


7.74. 


7.75. 


7.76. 
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Let S = X + Y, where X and Y are iid random variables uniformly distributed in the set 
{0,1,2}. 

(a) Find the N = 5 DFT for S. 

(b) Use the inverse DFT to find P[S = 2]. 

Let X be a binomial random variable with parameter n = 8 and p = 1/2. 

(a) Use the FFT to obtain the pmf of X from ® (w). 


(b) Use the FFT to obtain the pmf of Z = X + Y where X and Y are iid binomial ran- 
dom variables with n = 8 and p = 1/2. 


Let X; be a discrete random variable that is uniformly distributed in the set {0,1,..., 9}. 

Use the FFT to find the pmf of S, = X, + +--+ X,, forn = 5 and n = 10. Plot your re- 

sults and compare them to Fig. 7.16. 

Let X be the geometric random variable with parameter p = 1/2. Use the FFT to evalu- 

ate Eq. (7.55) to compute p for N = 8 and N = 16. Compare the results to those given 

by Eq. (7.57). 

Let X be a Poisson random variable with mean L = 5. 

(a) Use the FFT to obtain the pmf from ® y(@). Find the value of N for which the error 
in Eq. (7.55) is less than 1%. 

(b) Let S = X, + X +--+ X5, where the X; are iid Poisson random variables with 
mean L = 5. Use the FFT to compute the pmf of S from ®y(w). 

The probability generating function for the number N of customers in a certain queueing 

system (the so-called M/D/1 system discussed in Chapter 12) is 


(1 = p)(1 = z) 

1 — zee(t-2) 
where 0 = p = 1. Use the FFT to obtain the pmf of N for p = 1/2. 
Use the FFT to obtain approximately the pdf of a Laplacian random variable from its 
characteristic function. Use the same parameters as in Example 7.33 and compare your 
results to those shown in Fig. 7.17. 
Use the FFT to obtain approximately the pdf of Z = X + Y, where X and Y are inde- 
pendent Laplacian random variables with parameters a = 1 and a = 2, respectively. 


Gy(z) = 


Use the FFT to obtain approximately the pdf of a zero-mean, unit-variance Gaussian 
random variable from its characteristic function. Experiment with the values of N and wọ 
and compare the results given by the FFT with the exact values. 

Figures 7.2 through 7.4 for the cdf of the sum of iid Bernoulli, uniform, and exponential ran- 
dom variables were obtained using the FFT. Reproduce the results shown in these figures. 


Problems Requiring Cumulative Knowledge 


7.77. 


7.78. 


The number X of type 1 defects in a system is a binomial random variable with parame- 
ters n and p, and the number Y of type 2 defects is binomial with parameters m and r. 


(a) Find the probability generating function for the total number of defects in the system. 
(b) Find an expression for the probability that the total number of defects is k. 


(c) Letn = 32, p = 1/10, and m = 16,r = 1/8. Use the FFT to evaluate the pmf for the 
total number of defects in the system. 


Let U,, be a sequence of iid zero-mean, unit-variance Gaussian random variables. A “low- 
n 3 
pass filter” takes the sequence U,, and produces the sequence 


1 1\? 1\" 
X,, 5Un G) U, too 4 G) U.. 
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7.79. 


7.80. 
7.81. 


7.82. 


(a) 
(b) 
(c) 


Sums of Random Variables and Long-Term Averages 


Find the mean and variance of X,,. 
Find the characteristic function of X,,. What happens as n approaches infinity? 
Does this sequence of random variables converge? In what sense? 


Let S,, be the sum of a sequence of X;’s that are jointly Gaussian random variables with 
mean u and with the covariance function given in Problem 7.2. 


(a) 
(b) 
(c) 


(d) 


Find the characteristic function of S,,. 
Find the mean and variance of S, — Sm- 


Find the joint characteristic function of S,, and S,,. Hint: Assuming n > m, condi- 
tion on the value of S,,,. 


Does S,, converge in the mean square sense? 


Repeat Problem 7.79 with the sequence of X;’s given as jointly Gaussian random vari- 
ables with mean and covariance functions given in Problem 7.3. 


Let Z,, be the sequence of random variables defined in the formulation of the central 
limit theorem, Eq. (7.26a). Does Z,, converge in the mean square sense? 


Let X, be the sequence of independent, identically distributed outputs of an information 
source. At time n, the source produces symbols according to the following probabilities: 


(a) 


(b) 


(c) 


Symbol Probability }Codeword 


A 1/2 0 
B 1/4 10 
C 1/8 110 
D 1/16 1110 
E 1/16 1111 


The self-information of the output at time n is defined by the random variable 
Y, = —log, P| X,„]. Thus, for example, if the output is C, the self-information is 
—log, 1/8 = 3. Find the mean and variance of Y,,. Note that the expected value of 
the self-information is equal to the entropy of X (cf. Section 4.10). 


Consider the sequence of arithmetic averages of the self-information: 


SY. 
k=1 


1 
S, = 


n 


Do the weak law and strong law of large numbers apply to S,,? 

Now suppose that the outputs of the information source are encoded using the vari- 
able-length binary codewords indicated above. Note that the length of the code- 
words corresponds to the self-information of the corresponding symbol. Interpret 
the result of part b in terms of the rate at which bits are produced when the above 
code is applied to the information source outputs. 


Statistics 


8.1 


CHAPTER 


Probability theory allows us to model situations that exhibit randomness in terms of 
random experiments involving sample spaces, events, and probability distributions. 
The axioms of probability allow us to develop an extensive set of tools for calculating 
probabilities and averages for a wide array of random experiments. The field of statis- 
tics plays the key role of bridging probability models to the real world. In applying 
probability models to real situations, we must perform experiments and gather data to 
answer questions such as: 


e What are the values of parameters, e.g., mean and variance, of a random variable 
of interest? 


e Are the observed data consistent with an assumed distribution? 


e Are the observed data consistent with a given parameter value of a random 
variable? 


Statistics is concerned with the gathering and analysis of data and with the drawing of 
conclusions or inferences from the data. The methods from statistics provide us with 
the means to answer the above questions. 

In this chapter we first consider the estimation of parameters of random vari- 
ables. We develop methods for obtaining point estimates as well as confidence intervals 
for parameters of interest. We then consider hypothesis testing and develop methods 
that allow us to accept or reject statements about a random variable based on observed 
data. We will apply these methods to determine the goodness of fit of distributions to 
observed data. 

The Gaussian random variable plays a crucial role in statistics. We note that the 
Gaussian random variable is referred to as the normal random variable in the statistics 
literature. 


SAMPLES AND SAMPLING DISTRIBUTIONS 


The origin of the term “statistics” is in the gathering of data about the population in 
a state or locality in order to draw conclusions about properties of the population, 
e.g., potential tax revenue or size of pool of potential army recruits. Typically the 
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size of a population was too large to make an exhaustive analysis, so statistical infer- 
ences about the entire population were drawn based on observations from a sample 
of individuals. 

The term population is still used in statistics, but it now refers to the collection 
of all objects or elements under study in a given situation. We suppose that the prop- 
erty of interest is observable and measurable, and that it can be modeled by a random 
variable X. We gather observation data by taking samples from the population. In 
order for inferences about the population to be valid, it is important that the individ- 
uals in the sample be representative of the entire population. In essence, we 
require that the n observations be made from random experiments conducted 
under the same conditions. For this reason we define a random sample X,, = 
(Xi, X5,..., Xn) as consisting of n independent random variables with the same 
distribution as X. 

Statistical methods invariably involve performing calculations on the observed 
data. For example, we might be interested in inferring the values of a certain parameter 
0 of the population, that is, of the random variable X, such as the mean, variance, or 
probability of a certain event. We may also be interested in drawing conclusions about 
0 based on X,,. Typically we calculate a statistic based on the random sample 
X, = (X1, X,..., Xn): 


@(X,,) = g(X1, Xo,..., Xp). (8.1) 


In other words, a statistic is simply a function of the random vector X,,. Clearly the 
statistic © is itself a random variable, and so is subject to random variability. Therefore 
estimates, inferences and conclusions based on the statistic must be stated in proba- 
bilistic terms. 

We have already encountered statistics to estimate important parameters of a 
random variable. The sample mean is used to estimate the expected value of a random 
variable X: 


X, = —>) X}. (8.2) 


The relative frequency of an event A is a special case of a sample mean and is used to 
estimate the probability of A: 


fan) = DHA). (8.3) 


Other statistics involve estimation of the variance of X, the minimum and maximum of 
X, and the correlation between random variables X and Y. 

The sampling distribution of a statistic © is given by its probability distribu- 
tion (cdf, pdf, or pmf). The sampling distribution allows us to calculate parameters 
of ©, e.g., mean, variance, and moments, as well as probabilities involving ©, 
Pla < © < b]. We will see that the sampling distribution and its parameters allow us to 
determine the accuracy and quality of the statistic Ô. 
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Example 8.1 Mean and Variance of the Sample Mean 


Suppose that X has expected value E[ X] = u and variance VAR[ X] = o%. Find the mean and 


variance of @(X,,) = X,, the sample mean. 
The expected value of X, is given by: 


$x | =u. (8.4) 
E 


The variance of X, is given by: 


eent a i 
VAR[X,] = svar] 
n 


j=l 


since the X; are iid random variables. Equation (8.4) asserts that the sample mean is centered 
about the true mean m, and Eq. (8.5) states that the sample-mean estimates become clustered 
about m as n is increased. The Chebyshev inequality then leads to the weak law of large numbers 
which then asserts that @(X,,) = X,, converges to m in probability. 


Example 8.2 Sampling Distribution for the Sample Mean of Gaussian Random 
Variables 


Let X be a Gaussian random variable with expected value E[X] = u and variance VAR[X ] 
= o. Find the distribution of the sample mean based on iid observations X1, X2,..., Xn. 

If the samples X; are iid Gaussian random variables, then from Example 6.24 X, is also a 
Gaussian random variable with mean and variance given by Eqs. (8.4) and (8.5). We will see that 
many important statistical methods involve the following “one-tail” probability for the sample 
mean of Gaussian random variables: 


a= P|X,-p>cl|=P Fal Hs z 
T oxlVn axlVn 


- oF) =~ 


Let z, be the critical value for the standard (zero-mean, unit-variance) Gaussian random variable 
as shown in Fig. 8.1, so that 


a = OZ) = o(—7} 


The desired value for the constant c in the one-tail probability is: 


ox 


sac 
vn“ 


C= 


(8.7) 
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TABLE 8.1 Critical values for 
standard Gaussian random variable. 


a Za 
0.1000 1.2816 
0.0500 1.6449 
A . e 0.0250 1.9600 
, i 0.0100 2.3263 
0 Za l 0.0050 2.5758 
0.0025 2.8070 
FIGURE 8.1 0.0010 3.0903 
Critical value for standard Gaussian 0.0005 3.2906 
random variable. 0.0001 3.7191 


Table 8.1 shows common critical values for the Gaussian random variable. Thus for the one-tail 
probability with a = 0.05, Za = 1.6449 and c = 1.64490 y/ Vn. 
In the “two-tail” case we are interested in: 


=c -Xn TEL c l 


l-a=P\-c<X,-pwscl=P 
i [= aE c] Ez oxlVn oyxlVn 


Let a/2 = Q(zan), then the desired value of constant c is: 


ox 


Vn X 
For the two-tail probability with a = 0.010 then Zą2 = 2.5758 and c = 2.57580 y/ Vn. 


c= 


(8.8) 


Example 8.3 Sampling Distribution for the Sample Mean, Large n 


When X is not Gaussian but has finite mean and variance, then by the central limit theorem we 
have that for large n, 


eg (8.9) 
oylVn ox 
has approximately a zero-mean, unit-variance Gaussian distribution. Therefore when the num- 
ber of samples is large, the sample mean is approximately Gaussian. This allows us to compute 
probabilities involving X,, even though we do not know the distribution of X. This result finds 
numerous applications in statistics when the number of samples n is large. 


Example 8.4 Sampling Distribution of Binomial Random Variable 


We wish to estimate the probability of error p in a binary communication channel. We transmit a 
predetermined sequence of bits and observe the corresponding received sequence to determine the 
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sequence of transmission errors, J1, y,..., J,, where J; is the indicator function for the occurrence 
of the event A that corresponds to an error in the jth transmission. Let N4(n) be the total number 
of errors. The relative frequency of errors is used to estimate the probability of error p: 


Na() 


n 


faln) = = SA) = 


Assuming that the outcomes of different transmissions are independent, then the number of errors 
in the n transmissions, N,(7), is a binomial random variable with parameters n and p. The mean 
and variance of f4(n) are then: 
np np(1 — p) 
Elfaln)] = >> = p and VAR[fa(n)] = — 5. 

Using the approach from Example 7.10, we can bound the variance of f4(n) by 1/4n, and use the 
Chebyshev inequality to estimate the number of samples required so that there is some proba- 
bility, say 1 — a, that f4(n) is within e of p. 


1 


Ane 


Plifa(n) - pl < e] > 1 -— = 1-a. 


For n large, we can apply the central limit theorem where 


Z = faln) =p 
á V1/4n 


is approximately Gaussian with mean zero and unit variance. We then obtain: 


Plifa(n) — p|< £] = P[|Z,|< eV4n] © 1 — 20(eV/4n) = 1—- a. 


For example, if a = 0.05, then eV 4n = Zan = 1.96 and n = 1.967/4e. 


PARAMETER ESTIMATION 


In this section, we consider the problem of estimating a parameter 6 of a random vari- 
able X. We suppose that we have obtained a random sample X, = (X1, X2,..., Xn) 
consisting of independent, identically distributed versions of X. Our estimator is given 
by a function of X,,: 


O(X,,) = g(X1, Xs... Xn): (8.10) 


After making our n observations, we have the values (x1, x2,..., Xn) and evaluate the 


estimate for 6 by a single value g(x1, X2,..., X,). For this reason @(X,,) is called a point 
estimator for the parameter 6. 
We consider the following three questions: 


1. What properties characterize a good estimator? 
2. How do we determine that an estimator is better than another? 
3. How do we find good estimators? 


In addressing the above questions, we also introduce a variety of useful estimators. 
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Properties of Estimators 


Ideally, a good estimator should be equal to the parameter 6, on the average. We say 
that the estimator © is an unbiased estimator for 0 if 


A 


E[O] = 0. (8.11) 
The bias of any estimator Ô is defined by 
BLO] = E[O] -0 (8.12) 


From Eq. (8.4) in Example 8.1, we see that the sample mean is an unbiased estimator for 
the mean u. However, biased estimators are not unusual as illustrated by the following 
example. 


Example 8.5 The Sample Variance 


The sample mean gives us an estimate of the center of mass of observations of a random vari- 
able. We are also interested in the spread of these observations about this center of mass. An ob- 
vious estimator for the variance o% of X is the arithmetic average of the square variation about 
the sample mean: 


A 12 PS 
SAG = Xn? (8.13) 
n jZ 
where the sample mean is given by: 
= 12 
X, = — > X; (8.14) 
Nj=1 
Let’s check whether $? is an unbiased estimator. First, we rewrite Eq. (8.13): 
a 1 2 _ 1% \ F 2 
So = EG Xn) SX; HTH Xn) 
Nj= nj=i 
= 1X { X pba WX X. | X 2} 
nX T BY + 2X; =o) (mH Xn) + (u = Xn) 
E 
pS 2 á 1ğ =r 
= X- py? t- XVX (X-u) +- (a= x) 
A fit a ) OX) — w) + ) 
1 2 = n(u — Xn)? 
== (X; - u}? + “(u - %,)(nX, — np) 
Nj= n 
ly 2 p4 24K 2 
= n= (X; u) 2(Xn u) i (Xn u) 
1X 2 (xX 2 
= -E(X - u)? - (Xn - n) (8.15) 
E 
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The expected value of S? is then: 


ox = oY (8.16) 


where we used Eq. (8.2) for the variance of the sample mean. Equation (8.16) shows that the 
simple estimator given by Eq. (8.13) is a biased estimator for the variance. We can obtain an 
unbiased estimator for o% by dividing the sum in Eq. (8.15) by n — 1 instead of by n: 


S(x, — X,)*. (8.17) 


Equation (8.17) is used as the standard estimator for the variance of a random variable. 


A second measure of the quality of an estimator is the mean square estimation error: 


E[(@ — 6)"] = E[(® — EÔ] + E[6] - 0)?] 
= VAR[0] + B(O). (8.18) 


Obviously a good estimator should have a small mean square estimation error because 
this implies that the estimator values are clustered close to 6. If © is an unbiased estimator 
of 0, then B[@] = 0 and the mean square error is simply the variance of the estimator ©. 
In comparing two unbiased estimators, we clearly prefer the one with the smallest 
estimator variance. The comparison of biased estimators with unbiased estimators can 
be tricky. It is possible for a biased estimator to have a smaller mean square error than 
any unbiased estimator [Hardy]. In such situations the biased estimator may be 
preferable. 

The observant student will have noted that we already considered the problem of 
finding minimum mean square estimators in Chapter 6. In that discussion we were 
estimating the value of one random variable Y by a function of one or more observed 
random variables X,, X2,..., X,. In this section we are estimating a parameter 0 that 
is unknown but not random. 


Example 8.6 Estimators for the Exponential Random Variable 


The message interarrival times at a message center are exponential random variables with 
rate A messages per second. Compare the following two estimators for 0 = 1/A the mean 
interarrival time: 
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n 
ô = ee and Ô, = n* min(X;, X),..., X,)- (8.19) 
= 
The first estimator is simply the sample mean of the observed interarrival times. The second es- 
timator uses the fact from Example 6.10 that the minimum of n iid exponential random variables 
is itself an exponential random variable with mean interarrival time 1/nA. 
©; is the sample mean so we know that it is an unbiased estimator and that its mean 
square error is: 


A 1), ^ Tx 1 
E| (ô; - 5) ] = VAR[6,] = a ae 
On the other hand, min( X1, X2,..., Xn) is an exponential random variable with mean interar- 


rival time 1/nA, so 


A i n 1 
E[@2] = E[n* min( Xi,- X] =E 


Therefore @, is also an unbiased estimator for 0 = 1/A. The mean square error is: 
n? 1 


A 1\2 A i 
z| (6. — 1 j = VAR[ O7] = n? VAR[min(Xj,..., X,)] = ye = 2 


Clearly, @, is the preferred estimator because it has the smaller mean square estimation error. 


A third measure of quality of an estimator pertains to its behavior as the sample 
size n is increased. We say that © is a consistent estimator if © converges to 0 in prob- 
ability, that is, as per Eq. (7.21), for every € > 0, 

lim P[|O — 6| > s] = 0. (8.20) 
n—-Cco 
The estimator © is said to be a strongly consistent estimator if ô converges to 0 al- 
most surely, that is, with probability 1, cf. Eqs. (7.22) and (7.37). Consistent estimators, 
whether biased or unbiased, tend towards the correct value of 6 as n is increased. 


Example 8.7 Consistency of Sample Mean Estimator 


The weak law of large numbers states that the sample mean X,, converges to u = E[X] in prob- 
ability. Therefore the sample mean is a consistent estimator. Furthermore, the strong law of large 
numbers states the sample mean converges to u with probability 1. Therefore the sample mean is 
a strongly consistent estimator. 


Example 8.8 Consistency of Sample Variance Estimator 


Consider the unbiased sample variance estimator in Eq. (8.17). It can be shown (see Problem 8. 21) 
that the variance of 6? is: 


ee zaa) where py = E[(X — m)*). 


Z 1 
VARIG?) = Hu - 2 


8.2.2 
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If the fourth central moment jy is finite, then the above variance term approaches zero as n in- 
creases. By Chebyshev’s inequality we have that: 


VAR[6?] 


Pilea -— 07] > e] = 0 as n> 00, 
€ 


Therefore the sample variance estimator is consistent if u4 is finite. 


Finding Good Estimators 


Ideally we would like to have estimators that are unbiased, have minimum mean 
square error, and are consistent. Unfortunately, there is no guarantee that unbiased 
estimators or consistent estimators exist for all parameters of interest. There is also no 
straightforward method for finding the minimum mean square estimator for arbitrary 
parameters. Fortunately we do have the class of maximum likelihood estimators 
which are relatively easy to work with, have a number of desirable properties for n 
large, and often provide estimators that can be modified to be unbiased and minimum 
variance. The next section deals with maximun likelihood estimation. 


MAXIMUM LIKELIHOOD ESTIMATION 


We now consider the maximum likelihood method for finding a point estimator 0(X,,) 
for an unknown parameter 0. In this section we first show how the method works. We 
then present several properties that make maximum likelihood estimators very useful 
in practice. 

The maximum likelihood method selects as its estimate the parameter value 
that maximizes the probability of the observed data X,, = (X1, X2,..., Xn). Before 
introducing the formal method we use an example to demonstrate the basic 
approach. 


Example 8.9 Poisson Distributed Typos 


Papers submitted by Bob have been found to have a Poisson distributed number of typos with 
mean 1 typo per page, whereas papers prepared by John have a Poisson distributed number of 
typos with mean 5 typos per page. Suppose that a page that was submitted by either Bob or John 
has 2 typos. Who is the likely author? 

In the maximum likelihood approach we first calculate the probability of obtaining the 
given observation for each possible parameter value, thus: 


P[X =2|@=1] Liey 0.18394 
A Oe. 
5? 25 
PX =2|0=5 e” 0.084224. 
[ ] 71 505 


We then select the parameter value that gives the higher probability for the observation. In this 
case @(2) = 1 gives the higher probability, so the estimator selects Bob as the more likely au- 
thor of the page. 
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Let x, = (x1, X%2,..., Xn) be the observed values of a random sample for the ran- 
dom variable X and let 6 be the parameter of interest. The likelihood function of the 
sample is a function of 6 defined as follows: 


I(x; 0) = U(x1, X2,.--,Xn3 9) 


Oeste cree |0) X discrete random variable (8.21) 
fx(X1,.X2,---,X,|0) X continuous random variable : 
where px(X1,%2,---,Xn10) and fx(x1, X2,..-, Xn|0) are the joint pmf and joint pdf 
evaluated at the observation values if the parameter value is 6. Since the samples 
X1, X2,..., Xn are iid, we have a simple expression for the likelihood function: 
Px(X1, X23- -> Xn|0) = px(x110)px(x210)..- px = Txt i16) (8.22) 
and 
fx(X1, X2,--+,XnlO) = fx(x110)fx(x210) -.. fx(x*n10) = [x - (8.23) 


The maximum likelihood method selects the estimator value 6 = 0* where 6* is the 
parameter value that maximizes the likelihood function, that is, 


U(x1, X25... Xn} 0*) = maxl(x1, X2,..., Xn; 9) (8.24) 
6 


where the maximum is taken over all allowable values of 9. Usually 6 assumes a con- 
tinuous set of values, so we find the maximum of the likelihood function over 0 using 
standard methods from calculus. 

It is usually more convenient to work with the log likelihood function because we 
then work with the sum of terms instead of the product of terms in Eqs. (8.22) and (8.23): 


> In px(xjl@) = Š L(x;|0) X discrete random variable 


> In fy(x;l0) = X, L(x;|0) X continuous random variable. 


(8.25) 


Maximizing the log likelihood function is equivalent to maximizing the likelihood 
function since In(x) is an increasing function of x. We obtain the maximum likelihood 
estimate by finding the value 6* for which: 


ð 0 
gilo = 55 In d(x, 18) = 0. (8.26) 
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Example 8.10 Estimation of p for a Bernoulli random variable 


Suppose we perform n independent observations of a Bernoulli random variable with probabil- 
ity of success p. Find the maximum likelihood estimate for p. 

Let i, = (i), i2,...,%,) be the observed outcomes of the n Bernoulli trials. The pmf for an 
individual outcome can be written as follows: 


r i = Pp 
|p) = p — p) = 
px(ijlp) = p'(1 — p) iy EEA 


The log likelihood function is: 


InI(isis inp) = D Im px(iyl p) = Dim p + -iml - p) 827 


j=1 


We take the first derivative with respect to p and set the result equal to zero: 


TART, 
r N 


=-— (4: : )Si- E E T 


p 1l-p/f 1-p pl-pA 


Solving for p, we obtain: 


Therefore the maximun likelihood estimator for p is the relative frequency of successes, which is 
a special case of the sample mean. From the previous section we know that the sample mean 
estimator is unbiased and consistent. 


Example 8.11 Estimation of a for Poisson random variable 


Suppose we perform n independent observations of a Poisson random variable with mean a. 
Find the maximum likelihood estimate for a. 

Let the counts in the n independent trials be given by k1, k2,...,k,. The probability of 
observing k; events in the jth trial is: 


j=l j=l 
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To find the maximum, we take the first derivative with respect to a and set it equal to zero: 


d 1% 
0 = zg In Iki kz: -++ knz a) = P (8.29) 


Solving for a, we obtain: 


> | 


R 
* 
Il 
Ms: 
= 


j=l 


The maximum likelihood estimator for a is the sample mean of the event counts. 


Example 8.12 Estimation of Mean and Variance for Gaussian Random Variable 


Let x, = (x1, X2,..., Xn) be the observed values of a random sample for a Gaussian random 
variable X for which we wish to estimate two parameters: the mean 6; = u and variance 
0 = o%. The likelihood function is a function of two parameters 6, and 02, and we must simul- 
taneously maximize the likelihood with respect to these two parameters. 

The pdf for the jth observation is given by: 


1 
fx; | 6, : 6>) = e691) 1285 


V 2m2 


where we have replaced the mean and variance by 6, and 02, respectively. The log likelihood 
function is given by: 


In (x1, X25.. , Xn; 01, 02) = > In fx(x;| 61, 02) 
j=l 


2 

n n (x; — 4) 

= — Žin 270 . 
A > 205 


We take derivatives with respect to 6, and 0, and set the results equal to zero: 


= m 
0= AG 191, 02) oa aye” 
1 n 
=-—| Six, — nð (8.30) 
LA 
and 
ax n Tag 
0=—5 in |01, 02) = i = 0) 
30, S fel) 192) 20, 203 2 1) 
1 12 
moe Ie 8.31 
x ACA l (8.31) 
Equations (8.30) and (8.31) can be solved for 6; and 63, respectively, to obtain 
»~_ 1X 
6 = —> x; (8.32) 
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and 
* 1% #2 
03 = 720 - 0) (8.33) 


Thus, 6; is given by the sample mean and 603 is given by the biased sample variance 
discussed in Example 8.5. It is easy to show that as n becomes large, 0} approaches the 
unbiased 6%. 


The maximun likelihood estimator possesses an important invariance property 
that, in general, is not satisfied by other estimators. Suppose that instead of the para- 
meter 0, we are interested in estimating a function of 0, say h(@), which we assume is 
invertible. It can be shown then that if 6* is the maximum likelihood estimate of 0, then 
h(0*) is the maximum likelihood estimate for A(0). (See Problem 8.34.) As an exam- 
ple, consider the exponential random variable. Suppose that A* is the maximum likeli- 
hood estimate for the rate A of an exponential random variable. Suppose we are 
instead interested in h(A) = 1/A, the mean interarrival time of the exponential random 
variable. The invariance result of the maximum likelihood estimate implies that the 
maximum likelihood estimate is then h(A*) = 1/A*. 


Cramer-Rao Inequality’ 


In general, we would like to find the unbiased estimator © with the smallest possi- 
ble variance. This estimator would produce the most accurate estimates in the 
sense of being tightly clustered around the true value 6. The Cramer-Rao inequali- 
ty addresses this question in two steps. First, it provides a lower bound to the mini- 
mum possible variance achievable by any unbiased estimator. This bound provides 
a benchmark for assessing all unbiased estimators of 6. Second, if an unbiased esti- 
mator achieves the lower bound then it has the smallest possible variance and 
mean square error. Furthermore, this unbiased estimator can be found using the 
maximum likelihood method. 

Since the random sample X, is a vector random variable, we expect that the estimator 
@(X,,) will exhibit some unavoidable random variation and hence will have nonzero 
variance. Is there a lower limit to how small this variance can be? The answer is yes and 
the lower bound is given by the reciprocal of the Fisher information which is defined as 


follows: 
itis e| [z] E A [nda Ken EO yl 8.34) 


The pdf in Eq. (8.34) is replaced by a pmf if X is a discrete random variable. The term 
inside the braces is called the score function, which is defined as the partial derivative of the 
log likelihood function with respect to the parameter 0. Note that the score function is a 


1As a reminder, we note that this section (and other starred sections) presents advanced material and can be 
skipped without loss of continuity. 
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function of the vector random variable X,,. We have already seen this function when find- 
ing maximum likelihood estimators. The expected value of the score function is zero since: 


00 00 


= 1 Ofx(Xn | 0) 
Xn Fx(Xn | 0) a0 


Fx(Xn | 0) dX, 


Ofx(Xn | 0) 0 ð 
= dx, = dx, =—1=0, (8.35 
j: 30 Xn 30 | xn | 0) Xn 30 > ( ) 


where we assume that order of the partial derivative and integration can be exchanged. 
Therefore [,,(0) is equal to the variance of the score function. 

The score function measures the rate at which the log likelihood function changes 
as 0 varies. If L(X,,|@) tends to change quickly about the value 69 for most observations 
of X,,, we can expect that: (1) The Fisher information will tend to be large since the ar- 
gument inside the expected value in Eq. (8.34) will be large; (2) small departures from 
the value 69 will be readily discernable in the observed statistics because the underlying 
pdf is changing quickly. On the other hand, if the likelihood function changes slowly 
about 69, then the Fisher information will be small. In addition, significantly different 
values of #) may have quite similar likelihood functions making it difficult to distinguish 
among parameter values from the observed data. In summary, larger values of 7,(0) 
should allow for better performing estimators that will have smaller variances. 

The Fisher information has the following equivalent but more useful form when the 


pdf fx(x1, X2,---, X,|0) satisfies certain additional conditions (see Problem 8.35): 
3 In fx( Xi, X>,...,X,|0 PL(X,l0 
1,(0) e| A(X a J A ms A (8.36) 


Example 8.13 Fisher Information for Bernoulli Random Variable 


From Eqs. (8.27) and (8.28), the score and its derivative for the Bernoulli random variable are 


given by: 
ð Le oe 
In l(i, i2,..., in; p) = i; 1-i 
spin Mini oo 
and 
N E E A= = aay 
E lis 1Q5+++5ln3 P) = lj l 
ap” Pe -pf i 


The Fisher information, as given by Eq. (8.36), is then: 


1% 1 n 
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P (k= py pl =p) 


Note that 7„( p) is smallest near p = 1/2, and that it increases as p approaches 0 or 1, so p is easier 
to estimate accurately at the extreme values of p. Note as well that the Fisher information is pro- 
portional to the number of samples, that is, more samples make it easier to estimate p. 


Example 8.14 Fisher Information for an Exponential Random Variable 
The log likelihood function for the n samples of an exponential random variable is: 
In U(x, .X9,-.-5%n3A) = X nde = X (inà — Ax). 
j=1 j=l 


The score for n observations of an exponential random variable and its derivatives are given by: 


n n 
ga B lOa 425- Xn A) PA 2 
and 
a n 
xin I(x, X2, xX, A) = -E 


The Fisher information is then: 


Note that 7„,(à) decreases with increasing A. 


We are now ready to state the Cramer-Rao inequality. 


Theorem Cramer-Rao Inequality 


Let 0(X,,) be any unbiased estimator for the parameter 0 of X, then under certain regularity 
conditions? on the pdf fx(x1,%2,.--5X,/9); 


A 1 
(a) VAR[O(X,,)] = 7,(6) 7 (8.37) 
(b) with equality being achieved if and only if 
ET E E {O(x) - 0}k(0). (8.38) 
00 


?See [Bickel, p. 179]. 
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The Cramer-Rao lower bound confirms our conjecture that the variance of unbi- 
ased estimators must be bounded below by a nonzero value. If the Fisher information 
is high, then the lower bound is small, suggesting that low variance, and hence accurate, 
estimators are possible. The term 1/7,(0) serves as a reference point for the variance of 
all unbiased estimators, and the ratio (1//,,(@))/VAR[@] provides a measure of effi- 
ciency of an unbiased estimator. We say that an unbiased estimator is efficient if it 
achieves the lower bound. 

Assume that Eq. (8.38) is satisfied. The maximum likelihood estimator must then 
satisfy Eq. (8.26), and therefore 


g= Žin feli, X25» Xn30) = {ô(x) — ø*yk(0*). (8.39) 
We discard the case k(0*) = 0, and conclude that, in general, we must have 0* = ô (x). 
Therefore, if an efficient estimator exists then it can be found using the maximum likelihood 
method. If an efficient estimator does not exist, then the lower bound in Eq. (8.37) is not 
achieved by any unbiased estimator. 

In Examples 8.10 and 8.11 we derived unbiased maximum likelihood estima- 
tors for Bernoulli and for Poisson random variables. We note that in these examples 
the score function in the maximum likelihood equations (Eqs. 8.28 and 8.29) can be 
rearranged to have the form given in Eq. (8.39). Therefore we conclude that these 
estimators are efficient. 


Example 8.15 Cramer-Rao Lower Bound for Bernoulli Random Variable 
From Example 8.13, the Fisher information for the Bernoulli random variable is 


l(p) = pp 


Therefore the Cramer-Rao lower bound for the variance of the sample mean estimator for p is: 
1 _ p-p) 
T,(P) n 


The relative frequency estimator for p achieves this lower bound. 


VAR[Ô] = 


Proof of Cramer-Rao Inequality 


The proof of the Cramer-Rao inequality involves an application of the Schwarz in- 
equality. We assume that the score function exists and is finite. Consider the covariance 
of O©(X,„) and the score function: 


COV(Ô(X,), ZL(X,: 6) = BLOX) LX: 0)] 


ll 
> 
a 
So 
|> 
a 
va 
= 
3 
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where we used Eq. (5.30) and the fact that the expected value of the score is zero 
(Eq. 8.35). Next we evaluate the above expected value: 


COV(Ô(X,), Šin F(X qe 8)) = E| O(K,) Zn xX: 8) | 


i (ôa) Aa; a} A 


© {8ft 0)} dX. 


In the last step we assume that the integration and the partial derivative with respect to 6 can 
be interchanged. (The regularity conditions required by the theorem are needed to ensure 
that this step is valid.) Note that the integral in the last expression is E[@(X,,)] = 0, so 


A ð ð 
COV(Ô(X,), 5510 fx(Xn3 6) = 550 = 1. 


Next we apply the Schwarz inequality to the covariance: 


i= COV(O(X,.), “In fx(Xq30)) < (VARS VAR] In F(X: 0). 


Taking the square of both sides we conclude that: 
1s VAR(O(X,)]VAR| Žim fx(Xn; a) 
and finally 
VAR[O(X,,)] = Uvar in fx(Xn; o| = 1/1,(0). 
The last step uses the fact that the Fisher information is the variance of the score func- 
tion. This completes the proof of part a. 


Equality holds in the Schwarz inequality when the random variables in the vari- 
ances are proportional to each other, that is: 


k(6)[O(X,) — E[O(X,)] = k(@)[O(X,,) — 0] 


= Enf, 0) BE in fx(Xn; a) 
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where we noted that the expected value of the score function is 0 and that the estima- 
tor ©(X,,) is unbiased. This completes the proof of part b. 


Asymptotic Properties of Maximum Likelihood Estimators 


Maximum likelihood estimators satisfy the following asymptotic properties that make 
them very useful when the number of samples is large. 


1. Maximum likelihood estimates are consistent: 


lim 07* = 69 where 6) is the true parameter value. 
n—-Cco 
2. For n large, the maximum likelihood estimate 6% is asymptotically Gaussian dis- 
tributed, that is, Vn(6;, — 0o) has a Gaussian distribution with zero mean and 
variance 1/7,(0). 


3. Maximum likelihood estimates are asymptotically efficient: 
lim ——— = 1. (8.40) 


The consistency property (1) implies that maximum likelihood estimates will be 
close to the true value for large n, and asymptotic efficiency (3) implies that the vari- 
ance becomes as small as possible. The asymptotic Gaussian distributed property (2) is 
very useful because it allows us to evaluate the probabilities involving the maximum 
likelihood estimator. 


Example 8.16 Bernoulli Random Variable 


Find the distribution of the sample mean estimator for p for n large. 

If po is the true value of the Bernoulli random variable, then I(p 9) = (po(1 — po)) 
Therefore, the estimation error p* — po has a Gaussian pdf with mean zero and variance 
Po(1 — po). This is in agreement with Example 7.14 where we discussed the application of the 
central limit theorem to the sample mean of Bernoulli random variables. 


The asymptotic properties of the maximum likelihood estimator result from the 
law of large numbers and the central limit theorem. In the remainder of this section we 
indicate how these results come about. See [Cramer] for a proof of these results. Con- 
sider the arithmetic average of the log likelihood function for n samples of the random 
variable X: 


re pee 
L(X, 10) = “Ga (X,|6) = = ay in ful %)18). (8.41) 


We have intentionally written the log likelihood as a function of the random variables 
X,, X>,..., Xn. Clearly this arithmetic average is the sample mean of n independent 
observations of the following random variable: 


Y = g(X) = L(X |0) = In fx(X 10). 
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The random variable Y has mean given by: 


E[Y] = Elg(X)] = E[L(X|6)] = E[ln fx(X16)] = L(). (8.42) 
Assuming that Y satisfies the conditions for the law of large numbers, we then have: 

1 1 n n 

(Xn l6) = o In fy(X;| 0) = PU = L(0). (8.43) 


The function L(0) can be viewed as a limiting form of the log likelihood function. In par- 
ticular, using the steps that led to Eq. (4.109), we can show that the maximum of L(6) 
occurs at the true value of 6; that is, if @9 is the true value of the parameter, then: 


L(0) = L(0o) for all 8. (8.44) 


First consider the consistency property. Let 6;, be the maximum likelihood obtained from 
maximizing L(X,,| 0), or equivalently, L(X,,| @)/n. According to Eq. (8.43), L(X „| 0)/n isa 
sequence of functions of 0 that converges to L(@). It then follows that the sequence of max- 
ima of L(X,,| 0)/n, namely 6;,, converge to the maximum of L(@), which from Eq. (8.43) is 
the true value 6). Therefore the maximum likelihood estimator is consistent. 

Next we consider the asymptotic Gaussian property. To characterize the esti- 
mation error, 0n — 6), we apply the mean value theorem’ to the score function in the 
interval [6;,, 00]: 


0 0 
—L(X,30)| - —L(X,;6 
Zaio, gg Ze) 
32 
=op an (09 — 0n) for some 0, 0, < 0 < 0 


Note that the second term in the left-hand side is zero since 6, is the maximum like- 
lihood estimator for L(X,,| 6). The estimation error is then: 


ð 1 ð 
x a0 9% 00 % 
yaa ae a ea) 
ð 
ix.: -Č go oer” 
TERA) map O 


= ae “In fy(X,18) 


ð = 
ae fx(X;l J = —1,(0) 


3f(b) — f(a) = f'(c)(b — a) for some c,a < c < b, see, for example, [Edwards and Penney]. 
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where we used the alternative expression for the Fisher information of a single observa- 
tion. From the consistency property we have that 6, > 0o, and consequently, 0 — 6, 
since 6, < 0 < 69. Therefore the denominator approaches —J,(6)) and Eq. (8.45) 
becomes 


1o 
; npg een 9) i 
O (8.46) 
The numerator in Eq. (8.46) is an average of score functions, so 
Tarua. S RO los S 
E E ee, ea 
—1,(0) 1,(9) 1,(8) 


We know that the score function Y; for a single observation has zero mean and vari- 
ance [;(6)). The denominator in Eq. (8.47) scales each Y; by the factor —1/I,(09), so 
Eq. (8.47) becomes the sample mean of zero-mean random variables with variance 
1,(09)/1?(@9) = 1/I,(@). The central limit theorem implies that 


ok 


On ai A 
n — 
VULC) 


approaches a zero-mean, unit-variance Gaussian random variable. Therefore 
Vn(6;, — 0o) approaches a zero-mean Gaussian random variable with variance 
1/I,(0)). The asymptotic efficiency property also follows from this result. 


CONFIDENCE INTERVALS 
The sample mean estimator X, provides us with a single numerical value for the estimate 
of E[ X] = u, namely, 
=5 T 
Xn = => X. (8.48) 
n j=l 


This single number gives no indication of the accuracy of the estimate or the confi- 
dence that we can place on it. We can obtain an indication of accuracy by computing 
the sample variance, which is the average dispersion about X,,: 


E I g ae 
= XX - ys (8.49) 
a 15 


If &Ż is small, then the observations are tightly clustered about X,„, and we can be confident 
that X, is close to E[X]. On the other hand, if ô? is large, the samples are widely dispersed 
about X, and we cannot be confident that X, is close to E[X]. In this section we introduce 
the notion of confidence intervals, which approach the question in a different way. 


8.4.1 
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Instead of seeking a single value that we designate to be the “estimate” of the pa- 
rameter of interest (i.e., E[ X] = u), we attempt to specify an interval or set of values 
that is highly likely to contain the true value of the parameter. In particular, we can 
specify some high probability, say 1 — a, and pose the following problem: Find an in- 
terval [/(X), u(X)] such that 


P[I(X) <p <u(X)] =1-a. (8.50) 


In other words, we use the observed data to determine an interval that by design con- 
tains the true value of the parameter u with probability 1 — a. We say that such an in- 
terval isa (1 — a) X 100% confidence interval. 

This approach simultaneously handles the question of the accuracy and confi- 
dence of an estimate. The probability 1 — «œ is a measure of the consistency, and hence 
degree of confidence, with which the interval contains the desired parameter: If we 
were to compute confidence intervals a large number of times, we would find that 
approximately (1 — a) X 100% of the time, the computed intervals would contain the 
true value of the parameter. For this reason, 1 — a is called the confidence level. The 
width of a confidence interval is a measure of the accuracy with which we can pinpoint 
the estimate of a parameter. The narrower the confidence interval, the more accurate- 
ly we can specify the estimate for a parameter. 

The probability in Eq. (8.50) clearly depends on the pdf of the X;’s. In the 
remainder of this section, we obtain confidence intervals in the cases where the X7’s 
are Gaussian random variables or can be approximated by Gaussian random variables. 
We will use the equivalence between the following events: 


TEn ao x — ao x 
= Xn ae < PER age 


a, av = ao 
- fx,- X<y<X,+ x} 


Vin Vn 


The last event describes a confidence interval in terms of the observed data, and the 
first event will allow us to calculate probabilities from the sampling distributions. 


Case 1: Xs Gaussian; Unknown Mean and Known Variance 


Suppose that the X;’s are iid Gaussian random variables with unknown mean u and 
known variance o. From Example 7.3 and Eqs. (7.17) and (7.18), X, is then a Gaussian 
random variable with mean u and variance o ¥-/n, thus 


= A x, z a <u=X,+ A (8.51) 


432 


8.4.2 


Chapter 8 Statistics 


Equation (8.51) states that the interval [X, — za/Vn, X, + za/ vn] contains u with 
probability 1 — 2Q(z). If we let z,/. be the critical value such that a = 2Q(z«n2), then 
the (1 — a) confidence interval for the mean n is given by 


[Xn = Zano / Vn, X, + Zana/ Nn |. (8.52) 


The confidence interval in Eq. (8.52) depends on the sample mean X,„, the known 
variance o% of the X js, the number of measurements n, and the confidence level 
1 — a. Table 8.1 shows the values of z, corresponding to typical values of a. We can 
use the Octave function normal_inv(1 — a@/2, 0, 1) to find zan. This function was intro- 
duced in Example 4.51. 

When X is not Gaussian but the number of samples n is large, the sample mean 
X, will be approximately Gaussian if the central limit theorem applies. Therefore if n is 
large, then Eq. (8.52) provides a good approximate confidence interval. 


Example 8.17 Estimating Signal in Noise 
A voltage X is given by 


X=v+N, 


where v is an unknown constant voltage and N is a random noise voltage that has a Gaussian pdf 
with zero mean, and variance 1uV. Find the 95% confidence interval for v if the voltage X is 
measured 100 independent times and the sample mean is found to be 5.25 uV. 

From Example 4.17, we know that the voltage X is a Gaussian random variable with mean 
v and variance 1. Thus the 100 measurements X1, X>,..., Xio are lid Gaussian random vari- 
ables with mean v and variance 1. The confidence interval is given by Eq. (8.52) with zy. = 1.96: 


96(1) 1.96(1) 


1. 
5.25 „5.25 4 = [5.05,5.45]. 
10 10 


Case 2: Xps Gaussian; Mean and Variance Unknown 


Suppose that the X;’s are iid Gaussian random variables with unknown mean u and 
unknown variance o4, and that we are interested in finding a confidence interval 
for the mean u. Suppose we do the obvious thing in the confidence interval given by 
Eq. (8.52) by replacing the variance o” with its estimate, the sample variance G7 as 


given by Eq. (8.17): 


X tên Xn + tên (8.53) 
a aR . 
The probability for the interval in Eq. (8.53) is 


EE T re ON a Fe (8.54) 
ETa aa E e] 


Section 8.4 Confidence Intervals 433 


The random variable involved in Eq. (8.54) is 


= Xn “H 
T= A (8.55) 


In the end of this section we show that T has a Student’s t-distribution* with 
n — 1 degrees of freedom: 


f(y) = a (1+ y y (8.56) 
mI -AVe aTi 


Let F,,-;(y) be the cdf corresponding to f„-1(y), then the probability in 
Eq. (8.54) is given by 


P| ¥, - 1 « sro” ag dy = F, — Kl- 
n My ES n Vn = [pa y= n-1(t) n-1( t) 


n—1(t) — (1 = Fy-1(¢)) 
= 2F,-1(t) = 
=l-a (8.57) 


where we used the fact that f,,-;(y) is symmetric about y = 0. To obtain a con- 
fidence interval with confidence level 1 — a, we need to find the critical value 
tw2,n-1 for which 1 — æ = 2F,-1(ty2,n-1) — 1 or equivalently, F,,-1(te/, n—1) 
= 1-a/2. The (1 — a) X 100% confidence interval for the mean n is then 
given by 


(X, ~ tal, n-10n/ Vn, Xn + tan, n-1n/ Vn]. (8.58) 


The confidence interval in Eq. (8.58) depends on the sample mean X, and 
the sample variance 62, the number of measurements n, and a. Table 8.2 
shows values of tą, n for typical values of a and n. The Octave function t_inv 
(1 — a/2,n — 1) can be used to find the value ta/2,n-1- 

For a given 1 — a, the confidence intervals given by Eq. (8.58) should be 
wider than those given by Eq. (8.52), since the former assumes that the vari- 
ance is unknown. Figure 8.2 compares the Gaussian pdf and the Student’s t 
pdf. It can be seen that the Student’s t pdf’s are more dispersed than the 
Gaussian pdf and so they indeed lead to wider confidence intervals. On the 
other hand, since the accuracy of the sample variance increases with n, we can 
expect that the confidence interval given by Eq. (8.58) should approach that 
given by Eq. (8.52). It can be seen from Fig. 8.2 that the Student’s ¢ pdf’s do 
approach the pdf of a zero-mean, unit-variance Gaussian random variable 


‘The distribution is named after W. S. Gosset, who published under the pseudonym, “A. Student.” 
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TABLE 8.2 Critical values for Student’s t-distribution: F,(tg,,) = 1—a. 


n a 
0.1 0.05 0.025 0.01 0.005 

1 3.0777 6.3137 12.7062 31.8210 63.6559 
2 1.8856 2.9200 4.3027 6.9645 9.9250 
3 1.6377 2.3534 3.1824 4.5407 5.8408 
4 1.5332 2.1318 2.7765 3.7469 4.6041 
5 1.4759 2.0150 2.5706 3.3649 4.0321 
6 1.4398 1.9432 2.4469 3.1427 3.7074 
7 1.4149 1.8946 2.3646 2.9979 3.4995 
8 1.3968 1.8595 2.3060 2.8965 3.3554 
9 1.3830 1.8331 2.2622 2.8214 3.2498 
10 1.3722 1.8125 2.2281 2.7638 3.1693 
15 1.3406 1.7531 2.1315 2.6025 2.9467 
20 1.3253 1.7247 2.0860 2.5280 2.8453 
30 1.3104 1.6973 2.0423 2.4573 2.7500 
40 1.3031 1.6839 2.0211 2.4233 2.7045 
60 1.2958 1.6706 2.0003 2.3901 2.6603 
1000 1.2824 1.6464 1.9623 2.3301 2.5807 


with increasing n. This confirms that Eqs. (8.52) and (8.58) give the same confidence in- 
tervals for large n. Thus the bottom row (n = 1000) of Table 8.2 yields the same confi- 
dence intervals as Table 8.1. 


Gaussian 
n=8 


n=4 


Tam 


—4 =2 0 2 4 


FIGURE 8.2 
Gaussian pdf and Student's t pdf for n = 4andn = 8. 
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Example 8.18 Device Lifetimes 


The lifetime of a certain device is known to have a Gaussian distribution. Eight devices are test- 
ed and the sample mean and sample variance for the lifetime obtained are 10 days and 4 days’. 
Find the 99% confidence interval for the mean lifetime of the device. 

For a 99% confidence interval and n — 1 = 7, Table 8.2 gives ty.7 = 3.499. Thus the con- 
fidence interval is given by 


[io BAND) a eea = (7.53, 12.47} 


Ve V3 


Case 3: Xps Non-Gaussian; Mean and Variance Unknown 


Equation (8.58) can be misused to compute confidence intervals in experimental mea- 
surements and in computer simulation studies. The use of the method is justified only if 
the samples X; are iid and approximately Gaussian. 

If the random variables X; are not Gaussian, the above method for computing 
confidence intervals can be modified using the method of batch means. This method 
involves performing a series of independent experiments in which the sample mean X 
of the random variable is computed. If we assume that in each experiment each sample 
mean is calculated from a large number n of iid observations, then the central limit the- 
orem implies that the sample mean in each experiment is approximately Gaussian. We 
can therefore compute a confidence interval from Eq. (8.58) using the set of X sample 
means as the X7’s. 


Example 8.19 Method of Batch Means 


A computer simulation program generates exponentially distributed random variables of unknown 
mean. Two hundred samples of these random variables are generated and grouped into 10 batches 
of 20 samples each. The sample means of the 10 batches are given below: 


1.04190 0.64064 0.80967 0.75852 1.12439 


1.30220 0.98478 0.64574 1.39064 1.26890 


Find the 90% confidence interval for the mean of the random variable. 
The sample mean and the sample variance of the batch sample means are calculated from 
the above data and found to be 


Xio = 0.99674 aH = 0.07586. 


The 90% confidence interval is given by Eq. (8.58) with ta29 = 1.833 from Table 8.2: 
[0.83709, 1.15639]. 


This confidence interval suggests that E[ X] = 1. Indeed the simulation program used to generate 
the above data was set to produce exponential random variables with mean one. 


436 Chapter 8 Statistics 


8.4.4 Confidence Intervals for the Variance of a Gaussian Random Variable 


In principle, confidence intervals can be computed for any parameter 0 as long as the 
sampling distribution of an estimator for the parameter is known. Suppose we wish to 
find a confidence interval for the variance of a Gaussian random variable. Assume the 
mean is not known. Consider the unbiased sample variance estimator: 


has a chi-square distribution with n — 1 degrees of freedom. We use this to develop 
confidence intervals for the variance of a Gaussian random variable. 

The chi-square random variable was introduced in Example 4.34. It is easy to show 
(see Problem 8.6a) that the sum of the squares of n iid zero-mean, unit-variance Gauss- 
ian random variables results in a chi-square random variable of degree n. Figure 8.3 
shows the pdf of a chi-square random variable with 10 degrees of freedom. We need to 
find an interval that contains o Ẹ with probability 1 — a. We select two intervals, one for 
small values of x? and one for large values of a chi-square random variable Y, each of 
which have probability a/2, as shown in Fig. 8.3: 


n- 1) 

2 ( AD, 2 

P| ia < J On < Žan 
ox 


1-a 
=1-Ply,s Xi-an,n-1] - PIX} > Xar2,n-1]+ 
The above probability is equivalent to: 


n — 1)c% n — 1)c% 
Xal2,n—1 X1-al2,n-1 


2 2 
9 XT a/2,n-1 Xi-a/2,n-1 


FIGURE 8.3 
Critical values of chi-square random variables 
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and so we obtain the (1 — a) confidence interval for the variance o %: 


£ — én (n- u 


2 a) 
Xal2,n-1 X1-al2,n-1 


(8.59) 


Tables for the critical values X2 n-1 for which 
PLXn > Xann-1] = %2 


can be found in statistics handbooks such as [Kokoska]. Table 8.3 provides a small set 
of critical values for the chi-square distribution. These values can also be found using 
the Octave function chisquare_inv(1 — a/2,n). 


Example 8.20 The Sample Variance 


The sample variance in 10 measurements of a noise voltage is 5.67 millivolts. Find a 90% confi- 
dence interval for the variance. We need to find the critical values for a/2 = 0.05 and 
1 — a/2 = 0.95. From either Table 8.3 or Octave we find: 


chisquare_inv(.95,9) = 16.92 chisquare_inv(.05,9) = 3.33. 
The confidence interval for the variance is then: 
(n — 1)62 o (n- aa: Ea ; | 
SS SOS = <oxs = [3.02, 15.32]. 
Xan- = Xt-ann-1 16.92 4 3.33 


Summary of Confidence Intervals for Gaussian Random Variables 


In this section we have developed confidence intervals for the mean and variance of 
Gaussian random variables. The choice of confidence interval method depends on which 
parameters are known and on whether the number of samples is small or large. The cen- 
tral limit theorem makes the confidence intervals presented here applicable in a broad 
range of situations. Table 8.4 summarizes the confidence intervals developed in this sec- 
tion. The assumptions for each case and the corresponding confidence intervals are listed. 


Sampling Distributions for the Gaussian Random Variable 


In this section we derive the joint sampling distribution for the sample mean and the 
sample variance of the Gaussian random variables. Let X„ = (X1, X2,..., Xn) consist 
of independent, identically distributed versions of a Gaussian random variable with 
mean u and variance o Ẹ. We will develop the following results: 


1. The sample mean X, and the sample variance G? are independent random variables: 
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TABLE 8.3 Critical values for chi-square distribution, P[ x? > x2,,-1] = a. 


Statistics 


n\a 0.995 0.975 0.95 0.05 0.025 0.01 0.005 
1 3.9271E-05 0.0010 0.0039 3.8415 5.0239 6.6349 7.8794 
2 0.0100 0.0506 0.1026 5.9915 7.3778 9.2104 10.5965 
3 0.0717 0.2158 0.3518 7.8147 9.3484 11.3449 12.8381 
4 0.2070 0.4844 0.7107 9.4877 11.1433 13.2767 14.8602 
5 0.4118 0.8312 1.1455 11.0705 12.8325 15.0863 16.7496 
6 0.6757 1.2373 1.6354 12.5916 14.4494 16.8119 18.5475 
7 0.9893 1.6899 2.1673 14.0671 16.0128 18.4753 20.2777 
8 1.3444 2.1797 2.7326 15.5073 17.5345 20.0902 21.9549 
9 1.7349 2.7004 3.3251 16.9190 19.0228 21.6660 23.5893 
10 2.1558 3.2470 3.9403 18.3070 20.4832 23.2093 25.1881 
11 2.6032 3.8157 4.5748 19.6752 21.9200 24.7250 26.7569 
12 3.0738 4.4038 5.2260 21.0261 23.3367 26.2170 28.2997 
13 3.5650 5.0087 5.8919 22.3620 24.7356 27.6882 29.8193 
14 4.0747 5.6287 6.5706 23.6848 26.1189 29.1412 31.3194 
15 4.6009 6.2621 7.2609 24.9958 27.4884 30.5780 32.8015 
16 5.1422 6.9077 7.9616 26.2962 28.8453 31.9999 34.2671 
17 5.6973 7.5642 8.6718 27.5871 30.1910 33.4087 35.7184 
18 6.2648 8.2307 9.3904 28.8693 31.5264 34.8052 37.1564 
19 6.8439 8.9065 10.1170 30.1435 32.8523 36.1908 38.5821 
20 7.4338 9.5908 10.8508 31.4104 34.1696 37.5663 39.9969 
21 8.0336 10.2829 11.5913 32.6706 35.4789 38.9322 41.4009 
22 8.6427 10.9823 12.3380 33.9245 36.7807 40.2894 42.7957 
23 9.2604 11.6885 13.0905 35.1725 38.0756 41.6383 44.1814 
24 9.8862 12.4011 13.8484 36.4150 39.3641 42.9798 45.5584 
25 10.5196 13.1197 14.6114 37.6525 40.6465 44.3140 46.9280 
26 11.1602 13.8439 15.3792 38.8851 41.9231 45.6416 48.2898 
27 11.8077 14.5734 16.1514 40.1133 43.1945 46.9628 49.6450 
28 12.4613 15.3079 16.9279 41.3372 44.4608 48.2782 50.9936 
29 13.1211 16.0471 17.7084 42.5569 45.7223 49.5878 52.3355 
30 13.7867 16.7908 18.4927 43.7730 46.9792 50.8922 53.6719 
40 20.7066 24.4331 26.5093 55.7585 59.3417 63.6908 66.7660 
50 27.9908 32.3574 34.7642 67.5048 71.4202 76.1538 79.4898 
60 35.5344 40.4817 43.1880 79.0820 83.2977 88.3794 91.9518 
70 43.2753 48.7575 51.7393 90.5313 95.0231 100.4251 104.2148 
80 51.1719 57.1532 60.3915 101.8795 106.6285 112.3288 116.3209 
90 59.1963 65.6466 69.1260 113.1452 118.1359 124.1162 128.2987 
100 67.3275 74.2219 77.9294 124.3421 129.5613 135.8069 140.1697 


The random variable (n — 1)6?/0% has a chi-square distribution with n — 1 
degrees of freedom. 


The statistic 


Xn ZH 
„iVn 


has a Student’s t-distribution. 


(8.60) 
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TABLE 8.4 Summary of confidence intervals for Gaussian and non-Gaussian random variables. 


Parameter Case Confidence Interval 
u Gaussian random variable, o? known |X, = Zao | Vn, Xa + Zano/ vn] 
H Non-Gaussian random variable, n large, o° known |X, = ZanT/ Vn, X, + Zano/ vn] 
u Gaussian random variable, o° unknown (X, = tal,n—1F nl VN, Xn + tan, n-1fn/ vn] 
H Non-Gaussian random variable, o? unknown, batch means |X, = tah,n-10n/ Vn, Xn + tah, nF n/ vn] 
o 


Gaussian random variable, u unknown E 


(n= 1); (n- nê; 


> 
2 
Xan, n-1 X1-al2, n-1 


These three results are needed to develop confidence intervals for the mean and vari- 
ance of Gaussian distributed observations. 

First we show that the sample mean X, and the sample variance ĉĉ are indepen- 
dent random variables. For the sample mean we have 


n-1 


n 
Xn = 2%) = + X,, 
= 


which implies that 


n-1 n-1 
Xn Xn = (n 1)X, 2x; = ees “ad 
J= J= 


By replacing the last term in the sum that defines 6}, we obtain 


(n= DR = Sia- Ral = So- Xo? + | Soy - Ao} won 


Therefore ô? is determined by Y, = X; — X, fori = 1,...,n — 1. 
Next we show that X, and Y, = X; — X, are uncorrelated: 


E[X,,( X; = X,)| = E[X,,X;] = E[X;] 


ll 
by 
(=. 4 
Slr 
M= 
D 
X 
X 
L j] 
| 
M= 
M= 
D 
X 
& 


Define the n — 1 dimensional vector Y = (X; — X,,, X> — X,,---, X;-1 — X,,), then 
Y and X, are uncorrelated. Furthermore, Y and X, are defined by the following linear 
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transformation: 
Y= X,-X,=(1-1n)X, -X% -> = Xr 
Y =X, — X, = Xı + (1-1/n)X%} - -:: — Xn 
Yn-1 = Xn-1 Xn = xX Ne yl Sa eX y 
VSE, = Xı/n + Xo/n + ++: + X,/n. (8.63) 


The first n — 1 equations correspond to the terms in Y and the last term corresponds to 
Xn. We have shown that Y and X, are defined by a linear transformation of jointly 
Gaussian random variables X,, = (X1, X2,..., X„). It follows that Y and X, are jointly 
Gaussian. The fact that the components of Y and X, are uncorrelated implies that the 
components of Y are independent of X„. Recalling from Eq. (8.61) that &Ż is completely 
determined by the components of Y, we conclude that a? and X, are independent ran- 
dom variables. 

We now show that (n — 1)a?/a% has a chi-square distribution with n — 1 degrees 
of freedom. Using Eq. (8.15), we can express (n — 1)? as: 


(n= 1)62 = Sx; Iy = Sa, EEAS, 


which can be rearranged as follows after dividing both sides by ø 3: 


BEX BP Mee Noe {Kany 
> = > + ; 
j=l Ox ox axlVn 


The left-hand side of the above equation is the sum of the squares of n zero-mean, unit- 
variance independent Gaussian random variables. From Problem 7.6 we know that 
this sum is a chi-square random variable with n degrees of freedom. The rightmost term 
in the above equation is the square of a zero-mean, unit-variance Gaussian random vari- 
able and hence it is chi square with one degree of freedom. Finally, the two terms on the 
right-hand side of the equation are independent random variables since one depends on 
the sample variance and the other on the sample mean. Let ®(w) denote the characteris- 
tic function of the sample variance term. Using characteristic functions, the above equa- 
tion becomes: 


Gye)” = ®,(@) = ®(w)®\(w) = (7) » 


where we have inserted the expression for the chi-square random variables of degree n 
and degree 1. We can finally solve for the characteristic function of (n — 1)6?/a%: 


1 (n-1)/2 
P(w) = G = z5) 


We conclude that (n — 1)67/o0¥ is a chi-square random variable with n — 1 degrees of 
freedom. 


8.5 


8.5.1 
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Finally we consider the statistic: 
E Xa TH Vn(Xn ~ Blox = (Xn = p)I(ox!Vn) 
„Vn Velo V{(n = 1) }} (n = 1) 


The numerator in Eq. (8.64) is a zero-mean, unit-variance Gaussian random variable. 
We have just shown that {(n = 1)G/o%} is chi square with n — 1 degrees of free- 
dom. The numerator and denominator in the above expression are independent ran- 
dom variables since one depends on the sample mean and the other on the sample 
variance. In Example 6.14, we showed that given these conditions, T then has a 
Student’s f-distribution with n — 1 degrees of freedom. 


T 


(8.64) 


HYPOTHESIS TESTING 


In some situations we are interested in testing an assertion about a population based 
on a random sample X,,. This assertion is stated in the form of a hypothesis about the 
underlying distribution of X, and the objective of the test is to accept or reject the 
hypothesis based on the observed data X,,. Examples of such assertions are: 


e A given coin is fair. 

e A new manufacturing process produces “new and improved” batteries that last 
longer. 

e Two random noise signals have the same mean. 


We first consider significance testing where the objective is to accept or reject a given 
“null” hypothesis Hy. Next we consider the testing of Hp against an alternative 
hypothesis H; . We develop decision rules for determining the outcome of each test and 
introduce metrics for assessing the goodness or quality of these rules. 

In this section we use the traditional approach to hypothesis testing where we 
assume that the parameters of a distribution are unknown but not random. In the next 
section we use Bayesian models where the parameters of a distribution are random 
variables with known a priori probabilities. 


Significance Testing 


Suppose we want to test the hypothesis that a given coin is fair. We perform 100 flips of 
the coin and observe the number of heads N. Based on the value of N we must decide 
whether to accept or reject the hypothesis. Essentially, we need to divide the set of pos- 
sible outcomes of the coin flips {0, 1,..., 100} into a set of values for which we accept 
the hypothesis and another set of values for which we reject it. If the coin is fair we 
expect the value of N to be close to 50, so we include the numbers close to 50 in the set 
that accept the hypothesis. But exactly at what values do we start rejecting the hypoth- 
esis? There are many ways of partitioning the observation space into two regions, and 
clearly we need some criterion to guide us in making this choice. 

In the general case we wish to test a hypothesis Hy about a parameter 0 of the 
random variable X. We call Hp the null hypothesis. The objective of a significance test 
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is to accept or reject the null hypothesis based on a random sample 
X, = (Xi, X2,..., Xn). In particular we are interested in whether the observed data 
X, is significantly different from what would be expected if the null hypothesis is true. 
To specify a decision rule we partition the observation space into a rejection or critical 
region R where we reject the hypothesis and an acceptance region R° where we accept the 
hypothesis. The decision rule is then: 


Accept Hy if X,¢€ Re 


Reject Ho if X„eR. 0) 
Two kinds of errors can occur when executing this decision rule: 
Type I error: Reject Hp when Ho is true. (8.66) 
Type II error: Accept Hp when Hois false. 
If the hypothesis is true, then we can evaluate the probability of a Type I error: 
= P[Type I error] = J — fx(Xn| Ho) aX. (8.67) 
x,eR 


n 


If the null hypothesis is false, we have no information about the true distribution of the 
observations X, and hence we cannot evaluate the probability of Type II errors. 

We call æ the significance level of a test, and this value represents our tolerance 
for Type I errors, that is, of rejecting Hy when in fact it is true. The level of significance 
of a test provides an important design criterion for testing. Specifically, the rejection re- 
gion is chosen so that the probability of Type I error is no greater than a specified level 
a. Typical values of a are 1% and 5%. 


Example 8.21 Testing a Fair Coin 


Consider the significance test for Hp: coin is fair, that is, p = 1/2. Find a test at a significance 
level of 5%. 1 

We count the number of heads N in 100 flips of the coin. To find the rejection region R, we 
need to identify a subset of S = {0, 1,..., n} that has probability a, when the coin is fair. For 
example, we can let R be the set of integers outside the range 50 c: 


a = 0.05 = 1 — P[50 - c = N = 50 + c| Ho] 


50+c 100 1 100 N — 50 c 
-1- 5 (eG) z > e |= 22(8) 

iTA j /\2 100(1/2) (1/2) 5 
where we have used the Gaussian approximation to the binomial cdf. The two-sided critical 
value is Zo.o25 = 1.96 where Q( 20925) = 0.05/2 = 0.025. The desired value of c is then c/5 = 1.96, 
which gives c = 10 and the acceptance region R° = {40,41,...,60} and rejection region 
R = {k: |k — 50| > 10}. M 

Note, however, that the choice of R is not unique. As long as we meet the desired signifi- 

cance level, we could let R be integers greater than 50 + c. 


0.05 = PĪN = 50 + c| Ho] ~ p|” : as c] = o(£). 
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The value Zo.o5 = 1.64 gives O(Zo,95) = 0.05, which implies c = 5 X 1.64 ~ 8 and the correspond- 
ing acceptance region is R° = {0,1,..., 58} and rejection region R = {k > 58}. 

Either of the above two choices of rejection region satisfies the significance level require- 
ment. Intuitively, we have reason to believe that the two-sided choice of rejection region is more 
appropriate since deviations on the high or low side are significant insofar as judging the fairness 
of the coin is concerned. However, we need additional criteria to justify this choice. 


The previous example shows rejection regions that are defined in terms of either 
two tails or one tail of the distribution. We say that a test is two-tailed or two-sided if it 
involves two tails, that is, the rejection region consists of two intervals. Similarly, we 
refer to one-tailed or one-sided regions where the rejection region consists of a single 
interval. 


Example 8.22 Testing an Improved Battery 


A manufacturer claims that its new improved batteries have a longer lifetime. The old batteries 
are known to have a lifetime that is Gaussian distributed with mean 150 hours and variance 16. 
We measure the lifetime of nine batteries and obtain a sample mean of 155 hours. We assume 
that the variance of the lifetime is unchanged. Find a test at a 1% significance level. 

Let Ho be “battery lifetime is unchanged.” If Hy is true, then the sample mean Xg is Gauss- 
ian with mean 150 and variance 16/9. We reject the null hypothesis if the sample mean is signifi- 
cantly greater than 150. This leads to a one-sided test of the form R = {X > 150 + c}. We 
select the constant c to achieve the desired significance level: 


0.01 = P[X5 > 150 + c| Ho] | aoe l of : ) 
a= VU. = Cc = = P 
i ° V169 ~ V169 4/3 


The critical value Zoo, = 2.326 corresponds to Q(zo.o1) = 0.01 = a. Thus 3c/4 = 2.326, or 


c = 3.10. The rejection region is then Xo = 150 + 3.10 = 153.10. The observed sample mean 
155 is in the rejection region and so we reject the null hypothesis. The data suggest that the 
lifetime has improved. 


An alternative approach to hypothesis testing is to not set the level a ahead of 
time and thus not decide on a rejection region. Instead, based on the observation, e.g., 
Xn, we ask the question, “Assuming Hp is true, what is the probability that the statistic 
would assume a value as extreme or more extreme than X,,?” We call this probability 


the p-value of the test statistic. If p( X,„) is close to one, then there is no reason to reject 
the null hypothesis, but if p(X,,) is small, then there is reason to reject the null hypothesis. 
For example, in Example 8.22, the sample mean of 155 hours for n = 9 batteries 


has a p-value: 


A X; — 150 5 5 
P(X > 155| Ho] = ol 2 l = of ) = 8.84 x 105. 


> 
V'16/9 V16/9 4/3 


Note that an observation value of 153.10 would yield a p-value of 0.01. The p-value for 
155 is much smaller, so clearly this observation calls for the null hypothesis to be rejected 
at 1% and even lower levels. 
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Testing Simple Hypotheses 


A hypothesis test involves the testing of two or more hypotheses based on observed data. 
We will focus on the binary hypothesis case where we test a null hypothesis Hp against an 
alternative hypothesis H; . The outcome of the test is: accept Ho; or reject Ho and accept 
H,. A simple hypothesis specifies the associated distribution completely. If the distribu- 
tion is not specified completely (e.g.,a Gaussian pdf with mean zero and unknown vari- 
ance), then we say that we have a composite hypothesis. We consider the testing of two 
simple hypotheses first. This case appears frequently in electrical engineering in the 
context of communications systems. 

When the alternative hypothesis is simple, we can evaluate the probability of 
Type II errors, that is, of accepting Hp when H; is true. 


B = P[Type I error] = f _ fx(Xnl| M) dX,,. (8.68) 
x,ER® 


The probability of Type II error provides us with a second criterion in the design of a 
hypothesis test. 


Example 8.23 The Radar Detection Problem 


A radar system needs to distinguish between the presence or absence of a target. We pose the 
following simple binary hypothesis test based on the received signal X: 


Ho: no target present, X is Gaussian with u = O and oå = 1 
Hı: target present, X is Gaussian with w = 1 and c4 = 1. 


Unlike the case of significance testing, the pdf for the observation is given for both 
hypotheses: 


fx(x| Ho) = ger 
\V 2r 
1 2 
fx(x| My) = eel? 
x 1 ae 


Figure 8.4 shows the pdf of the observation under each of the hypotheses. The rejection 
region should be clearly of the form {X > y} for some suitable constant y. The decision rule 


Jx | Ho) Sx (& | H) 


a ae 


LA AILLI LLL LAAX 
0 y 1 


Rejection region R 


FIGURE 8.4 
Rejection region. 
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is then: 
Accept Hy if X = y 
Accept H, if X > y. (8.69) 


The Type I error corresponds to a false alarm and is given by: 


a=P[X > y| Ho eo? dx = Oly) = Pra. (8.70) 


= 


The Type II error corresponds to a miss and is given by: 


B= P[X =< y| H, OW? dx = 1- Q(y-1)=1- Pp, (8.71) 


roi 
ae 
—00 2a 
where Pp is the probability of detection when the target is present. Note the tradeoff between 
the two types of errors: As y increases, the Type I error probability a decreases from 1 to 0, while 


the Type II error probability 8 increases from 0 to 1. The choice y strikes a balance between the 
two types of errors. 


The following example shows that the number of observation samples n provides 
an additional degree of freedom in designing a hypothesis test. 


Example 8.24 Using Sample Size to Select Type | and Type II Error Probabilities 


Select the number of samples n in the radar detection problem so that the probability of false 
alarm is a = Pr, = 0.05 and the probability of detection is Pp = 1 — B = 0.99. 

If Hp is true, then the sample mean of n independent observations X,, is Gaussian with 
mean zero and variance 1/n. If Hj is true, then X, is Gaussian with mean 1 and variance 1/n. The 
false alarm probability is: 


aoe 2 Vn —Varl2 
a = P| X, > y| Ho) = I — eV"? dx = Q(Vny) = Pra» (8.72) 
° y Vr 
and the detection probability is: 


Pp = P[X, > y] S VN pva- dx = Q(Vn(y — 1)). (8.73) 


Se Nim 


We pick Vny = Q7'(a) = Q71(0.05) = 1.64 to meet the significance level requirement and we 
pick Vn(y — 1) = Q7'(0.99) = —2.33 to meet the detection probability requirement. We then 
obtain y = 0.41 andn = 16. 


Different criteria can be used to select the rejection region for rejecting the null 
hypothesis. A common approach is to select y so the Type I error is a. This approach, 
however, does not completely specify the rejection region, for example, we may have a 
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choice between one-sided and two-sided tests. The Neyman-Pearson criterion identi- 
fies the rejection region in a simple binary hypothesis test in which the Type I error is 
equal to a and where the Type II error 8 is minimized. The following result shows how 
to obtain the Neyman-Pearson test. 


Theorem Neyman-Pearson Hypothesis Test 
Assume that X is a continuous random variable. The decision rule that minimizes the Type II error 


probability 6 subject to the constraint that the Type I error probability is equal to a is given by: 


= fx(x| H 
Accept Mitxe R= fx: A(x) = eli) x} 


fx(x| Ho) 
Ss fx(x| H 
Accept H,ifxeR = {x:A() = BE) = x (8.74) 
fx(x| Ho) 
where x is chosen so that: 
a= fx(Xn | Ho) dx,. (8.75) 
A(x„,)=K 


Note that terms where A(x) = « can be assigned to either R or R°. We prove the theo- 
rem at the end of the section. A(x) is called the likelihood ratio function and is given by 
the ratio of the likelihood of the observation x given H; to the likelihood given Hy. The 
Neyman-Pearson test rejects the null hypothesis whenever the likelihood ratio is equal 
or exceeds the threshold k. A more compact form of writing the test is: 


A(x) T k. (8.76) 


Since the log function is an increasing function, we can equivalently work with the log 
likelihood ratio: 


In A(x) _ Ink. (8.77) 


Example 8.25 Testing the Means of Two Gaussian Random Variables 


Let X,, = (X1, X>,..., Xn) be iid samples of Gaussian random variables with known variance 
o% . For m, > mg, find the Neyman-Pearson test for: 


Hy: X is Gaussian with u = mọ and o Ẹ known 


H,: X is Gaussian with u = m, and o% known. 
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The likelihood functions for the observation vector x are: 


1 2 2 2 Ps 

fx(x| Ho) = eg (7m) (82-1710) F + (En IM) 20 
oy V 20" 

fx(x| A) = ry)? am) xm) 


Hı 
1 i nI 
In A(x) = a7 | 2(mọ — m,)nX, — n(mi — ms) | z Ink 
Ho 
Ay 
54 2) < 2 
[20n — m)nX,„, — n(m — ms) | Si 2o% Ink. 
Ho 
Ay 
— < —20% Ink + n(m — me), 
: X (mi 0) a be (8.78) 
> 2(m, — mo)n 
Ho 


Note the change in the direction of the inequality when we divided both sides by the negative 
number —2o%. The threshold value y is selected so that the significance level is a. 


a = P[X, > y| Ho 


—((x-mp)")I((20 y”)) in =m 
((x-mp)*)/((2 AM ae = Ol va? r) 


"5 1 
= e} 
) i V2ro%in ox 
and thus Vn(y — mo) = Zag y, and y = Mo + Zyox/Vn. 

The radar detection problem is a special case of this problem, and after substituting for 
the appropriate variables, we see that the Neyman-Pearson test leads to the same choice of 
rejection region. Therefore we know that the test in Example 8.24 also minimizes the Type II 
error probability, and maximizes the detection probability Pp = 1 — B. 
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The Neyman-Pearson test also applies when_X is a discrete random variable, with 
the likelihood function defined as follows: 


A(x) = ~——+ ~ x (8.79) 


where the threshold x is the largest value for which 


D Px(Xn| Ho) = a. (8.80) 
A(x,) =k 


Note that equality cannot always be achieved in the above equation when dealing with 
discrete random variables. 

The maximum likelihood test for a simple binary hypothesis can be obtained as 
the special case where x = 1 in Eq. (8.76). In this case, we have: 


H; 
x| H) > 
We fx(x| A) L 
fx(x| Ho) < 
Ho 
which is equivalent to 
H; 
> 
fx(xl Hi) Z fx(x | Ho). (8.81) 
Ho 


The test simply selects the hypothesis with the higher likelihood. Note that this decision 
rule can be readily generalized to the case of testing multiple simple hypotheses. 

We conclude this subsection by proving the Neyman-Pearson result. We wish 
to minimize ß given by Eq. (8.68), subject to the constraint that the Type I error 
probability is a, Eq. (8.75). We use Lagrange multipliers to perform this constrained 
minimization: 


G= f. fx(Xn| Hy) dx, + | fx(Xn| Ho) dx, — «| 


= f fx(Xn l| H) dX, an sf -f fx(Xn| Ho) dX, = «| 
R° Re 


ACL — a) + fixin) — Afx(Xn | Ho) } dx. 


8.5.3 
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For any A > 0, we minimize G by including in R° all points x, for which the term in 
braces is negative, that is, 


Re T {xn: fx(Xn l| M) >= Afx(Xn | Ho) < O} a fx. 


We choose A to meet the constraint: 


a= 
fx(x,|H 
I, Sx doy 


™ fx(Xnl Ho) 


Fx(Xn | Ho) dX, = i Fx(&n | Ho) dXn = fay | Ho) dy 
\ {x,: A(x,)>A} À 


where f,(y| Ho) is the pdf of the likelihood function A(x). The likelihood function is 
the ratio of two pdfs, so it is always positive. Therefore the integral on the right-hand 
side will range over positive values of y, and the final choice of A will be positive as re- 
quired above. 


Testing Composite Hypotheses 


Many situations in practice lead to the testing of a simple null hypothesis against a 
composite alternative hypothesis. This happens because frequently one hypothesis is 
very well specified and the other is not. Examples are not hard to find. In the testing of 
a “new longer lasting” battery, the natural null hypothesis is that the mean of the life- 
time is unchanged, that is u = 69, and the alternative hypothesis is that the mean has 
increased, that is u > 6,. In another example, we may wish to test whether a certain 
voltage signal has a dc component. In this case, the null hypothesis is u = 0 and the al- 
ternative hypothesis is u # 0. In a third example, we may wish to determine whether 
response times in a certain system have become more variable. The null hypothesis is 
now gł = 6 and the alternative hypothesis is 7% > 0. 

All the above examples test a simple null hypothesis, 6 = 69, against a composite 
alternative hypothesis such as 0 # 69,0 > 6, or 0 < 69. We now consider the design 
of tests for these scenarios. As before, we require that the rejection region R be select- 
ed so that the Type I error probability is a. We are now interested in the power 
1 — (0) of the test. B(@) is the probability that a test accepts the null hypothesis when 
the true parameter is 6. The power 1 — (0) is then the probability of rejecting the null 
hypothesis when the true parameter is 6. Therefore, we want 1 — (0) to be near 1 
when 6 # 6) and small when 6 = 6. 


Example 8.26 One-Sided Test for Mean of a Gaussian Random Variable 
(Known Variance) 


Revisit Example 8.22 where we developed a test to decide whether a new design yields longer- 
lasting batteries. Plot the power of the test as a function of the true mean u. Assume a significance 
level of a = 0.01 and consider the cases where the test uses n = 4, 9, 25, and 100 observations. 
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This test involves a simple hypothesis with a Gaussian random variable with known mean 
and variance, and a composite alternative hypothesis with a Gaussian random variable with 
known variance but unknown mean: 


Ho: X is Gaussian with u = 150 ando% = 16 
H,: X is Gaussian with u > 150 and oå = 16. 
The rejection region has the form R = {x:x,, — 150 > c} where c is chosen so: 
X, — 150 c cvn 
> =1-@Q : 
Vi6in = V/16/n 4 


Letting Za be the critical value for a, then c = 4z,/Vn, and: 


a = P[X, — 150 > c| m] = il 


R = {x:x, — 150 > 4z,/Vn}. 
The Type II error probability depends on the true mean p and is given by: 


_ X, — 150 
B(w) = PLX, — 150 = 4z,/Vn| u] = P| SZ, 


V16/n | 


If the true pdf of X has mean p and variance 16, then the sample mean X, is Gaussian with mean 
u and variance 16/n. We need to rearrange the expression in the probability in terms of the stan- 


dard Gaussian random variable (X,, — )/V 16/n: 


a |= -150 _ l p| 2 -150-4 u l 
W= R E a Ee a u 
V 16/n V 16/n V 16/n 
Xe u — 150 | ( tae] 
=P = Za H 1 -— Of Z ; 
| V 16/n V 16/n V 16/n 


For a = 0.01, z, = 2.326. The power function is then: 


naan of u- =) o(2 E a = 
= Za = : ial 
j V 16/n V 16/n 


The ideal curve for the power function in this case is equal to a when u = 150, which is when 
null hypothesis is true, and then increases quickly as the true mean u increases beyond 150. 
Figure 8.5 shows that the power curve for the test under consideration does drop near u = 150, 
and that the curve approaches the ideal shape as the number of observations n is increased. 


If we have two tests for a simple binary hypothesis that achieve a significance level 
a, choosing between two tests is simple. We choose the test with the smaller Type I 
error probability 8, which is equivalent to picking the test with higher power. Selecting 
between two tests is not quite as simple when we test a simple null hypothesis against a 
composite alternative hypothesis. The power 1 — 8 of a test will now vary with the true 
value of the alternative 6,. The perfect hypothesis test would be one that achieves the 
significance level a, and that gives the highest power for each value of the alternative 
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FIGURE 8.5 
Power curve for one-sided test of Gaussian means. 


hypothesis. We call such a test the uniformly most powerful (UMP) test. The following 
example shows that the one-sided test developed in Example 8.25 is uniformly most 
powerful. 


Example 8.27 One-Sided Test for Gaussian Means is UMP 
In Example 8.25 we developed a test for two simple hypotheses: 
Hy: X is Gaussian with u = my and 0% known 


H,: X is Gaussian with u = m; and o$ known. 


We used the Neyman-Pearson result to obtain the most powerful test for comparing Hp: y = mo 
and Hı: u = mı. The rejection region of the test is: 


X, > my + z,0/Vn. (8.82) 


Note that in this test, the rejection region does not depend on the value of the alternative my. 
Therefore the Neyman-Pearson test for Hp: u = mọ against H,: u = m; for any m, > mo, will 
lead to the same test specified by Eq. (8.82). It then follows that Eq. (8.82) is the uniformly most 
powerful test for 


Ho: X is Gaussian with u = mo and o% known 


H,: X is Gaussian with u > mọ and 0% known. 
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By following the same development of the previous example, we can readily 
show that the test of Hp: u = mọ against H;: u < mọ has rejection region 


Xa < mo — ZalVn (8.83) 


and is uniformly most powerful as well. On the other hand, the above results are not 
useful in finding a uniformly most powerful test for Ho: u = mọ against Hy: u # mo, 
where we need to deal with both u < mg and u > mo, and hence with tests that have 
different rejection regions. (See Problem 8.62.) 


Example 8.28 Two-Sided Test for Mean of a Gaussian Random Variable (Known 
Variance) 


Develop a test to decide whether a certain voltage signal has a dc component. Assume that 
the signal is Gaussian distributed and is known to have unit variance. Assuming that 
a = 0.01, how many samples are required so that a dc voltage of 0.25 volts would be rejected 
with probability 0.90? 

This test involves the mean of a Gaussian random variable with known variance: 


Ho: X is Gaussian with u = Oando% = 1 
H,: X is Gaussian with u + 0 and c$ = 1. 


When Hh is true, the sample mean X, is Gaussian with mean 0 and variance 1/n. The rejection 
region involves two tails and has form R = {x:|x,| > c} where c is chosen so: 


n 


Vn 


C 
> 
Vn 


a = P[|X,| > clHp] = zel | = 20(cVn). (8.84) 


Letting Zan be the rejection value for a/2, then c = Zan’ Vn, and the rejection region is: 
R = {x: x,,| > Zann Vn}. 


When the true mean is u, the sample mean has mean p, and variance 1/n, so the Type II error 
probability is given by: 


Xa TH 
1/Vn 
= o( Zal2 Vnu) Ql zan Vnu). 


= Zan — Vnu 


d 


For a = 0.01, zan = 2.576. The Type II error probability for u = 0.25 is then: 


B(0.25) = Q(—2.576 — 0.25Vn) — O(+2.576 — 0.25Vn). 


The above equation can be solved for n by trial and error. Since Q(x) is a decreasing function, 
and since the arguments of the two Q functions differ by more than 5, we can neglect the second 
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term so that 


B(0.25) = Q(—2.576 — 0.25Vn). 


Letting zg be the critical value for £, then zg = —2.576 — 0.25V/n, and 


"ozs j’ 
If 6 = 1 — 0.90 = 0.10, then zg = 1.282, and the required number of samples is n = 238. 


In Examples 8.27 and 8.28 we have developed hypothesis tests involving the 
means of Gaussian random variables where the variances are known. The definition of 
the rejection regions in these tests depends on the fact that the sample mean X, is a 
Gaussian random variable. Therefore, these hypothesis tests can also be used in situations 
where the individual observations are not Gaussian, but where the number of samples n is 
sufficiently large to apply the central limit theorem and approximate X,, by a Gaussian 
random variable. 


Example 8.29 Two-Sided Test for Mean of a Gaussian Random Variable 
(Unknown Variance) 


Develop a test to decide whether a certain voltage signal has a dc component equal to 
mo = 1.5 volts. Assume that the signal samples are Gaussian but the variance is unknown. Apply 
the test at a 5% level in an experiment where a set of 9 measurements has resulted in a sample 
mean of 1.75 volts and a sample variance of 2.25 volts. 

We now are considering two composite hypotheses: 


Ho: X is Gaussian with u = my and 7% unknown 


Hi: X is Gaussian with u # mọ and øo $ unknown. 


We proceed by emulating the solution in the case where the variance is known. We approximate 


the statistic (X,, — mp)/(ox/Vn) by one that uses the sample variance given by Eq. (8.17): 


(Xn = mo) 


T = —————. 
„Vn 


(8.85) 


From the previous section (Eq. 8.64), we know that T has a Student’s t-distribution. For the 
rejection region we use: 
R= fx > cb 


The threshold c is chosen to provide the desired significance level: 


(x — mo) 


ê, Vn 


Xa — mo 
TA il eee seļ=1 BAIE RAC RE 
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where F,,_;(t) is the cdf of the Student’s t random variable with n — 1 degrees of freedom. Let 
tan,n-1 be the value for which a/2 = 1 — Fa-1(tan,n-1) = Fa-1(—tan,n-1), then c = ta »-1- The 
decision rule is then: 


A t Hp if ( 0) st. 
ccep 1 k ‘al2.n— 
0 Va /2,n-1 
Accept H, if ino) o) >t (8 86) 
1 „IVN al2,n—1'* * 


The threshold for a/2 = 0.025 and n= 9 — 1 = 8, is fo 025g = 2.306. The test statistic is 
(1.75 — 1.5)/(2.25/9)1/2 = 0.5, which is less than 2.306. Therefore the null hypothesis is accepted; 
the data support the assertion that the dc voltage is 1.5 volts. 


One-sided tests for testing the mean of Gaussian random variables when the variance 
is unknown can be developed using the approach in the previous example. Recall from 
Table 8.2 that the critical values of the Student’s ¢-distribution approach those of a Gaussian 
random variable as the number of samples is increased. Thus the Student’s t hypothesis 
tests are only necessary when dealing with a small number of Gaussian observations. 


Example 8.30 Testing the Variance of a Gaussian Random Variable 


We wish to determine whether the variability of the response times in a certain system has changed 
from the past value of o% = 35 sec”. We measure a sample variance of 37 sec? for n = 30 mea- 
surements of the response time. Determine whether the null hypothesis, 7% = 35, should be re- 
jected against the alternative hypothesis, o% # 35, at a 5% significance level. 

We now have: 


Ho: X is Gaussian with 0% = 0 and m unknown 
H: X is Gaussian with 0% # 04 and m unknown. 


In the previous section we showed that the statistic (n — 1)G7/o$ is a chi-square random vari- 
able with n — 1 degrees of freedom if X has variance øĝ. We consider a rejection region in which 
Hh is rejected if the ratio of the statistic relative to øĝ is too large: 


z (n — 16h 
R° =4x:a s —— & by. 


We choose the threshold values a and b as we did in Eq. (8.59) to provide the desired significance 


level: 
(n = 183 (n=), 
l-a=Plas 7 Sb = P| Xi-an-1 < a _ * Xani 


where 2/7 ,-; and Karri are critical values of the chi-square distribution. The decision rule is 
then: 


(n = 1); 
Accept Ho if Xi-arn-1 < — 5 < Xan,n-1 
o0 
Accept H, otherwise. (8.87) 


8.5.4 


8.5.5 


8.6 


8.6.1 
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Table 8.3 gives the required critical values X0.025,29 = 45.72 and X0.975,29 = 16.04, so the accep- 
tance region is: 
(n = 1); 
16.04 < ——,— < 45.72. 
o 

The sample variance is 37 sec? and the statistic is (n — 1)67/0% = 29(37)/35 = 30.66. This statis- 
tic is inside the acceptance region so we accept the null hypothesis. The data do not suggest an in- 
crease in the variability of response times. 


Confidence Intervals and Hypothesis Testing 


Before concluding this section, we discuss the relationship between confidence inter- 
vals and hypothesis testing. Consider the acceptance region for a two-sided test involv- 
ing the mean of a Gaussian random variable with known variance (Example 8.29): 
Hy: u = mo vs. Hy: u # mo. In Section 8.4 we found the equivalence of the following 
events: 


Xn TH > Zan X > Zah X 
=Z S SS ES = 4X, Sus<sx, + : 
al2 oylVn wh n Vn u n Vn 


The null hypothesis is accepted when the sample mean is inside the interval in the event 
on the left-hand side. The endpoints of the event have been selected so that the probabil- 
ity of the event is 1 — a when Ho is true. Now, when Ho is true we have u = mọ, so the 
event on the right-hand side states that we accept Hy) when mọ is inside the interval 
[Xn — Zane xl Vn, X, + Zano x/Vn]. Thus we conclude that the hypothesis test will not 
reject Ho in favor of H; if mọ is in the 1 — a confidence interval for u. Similar relation- 
ships exist between one-sided hypothesis tests and confidence intervals that attempt to 
find lower or upper bounds for parameters of interest. 


Summary of Hypothesis Tests 


This section has developed many of the most common hypothesis tests used in prac- 
tice. We developed the tests in the context of specific examples. Table 8.5 summarizes 
the basic hypothesis tests that were developed in this section. The table presents the 
tests with the general test statistics and parameters. 


BAYESIAN DECISION METHODS 


In the previous sections we developed methods for estimating and for drawing infer- 
ences about a parameter 0 assuming that 6 is unknown but not random. In this section, 
we explore methods that assume that 0 is a random variable and that we have a priori 
knowledge of its distribution. This new assumption leads to new methods for address- 
ing estimation and hypothesis testing problems. 


Bayes Hypothesis Testing 


Consider a simple binary hypothesis problem where we are to decide between two hy- 
potheses based on a random sample X,, = (X1, X2,..., Xn): 
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TABLE 8.5 Summary of basic hypothesis tests for Gaussian and non-Gaussian random variables. 


Hypothesis Test Case Statistic Rejection Region 
Ho: u = mo vs. Hy: w # mọ Gaussian random variable, a? known; or Bic X, — Mo |Z| = zan 
non-Gaussian random variable, n large. 
ee : > > alVn = 
Aly: y = mo vs. Hy: u > mo 2k IVn Z = Za 
o° known Piers 
Aly: y = mo vs. Hy: u < mo a 
Ho: u = mo vs. Hyp ¥ mo Gaussin random variable oo X, — Mo IT] = tant 
o“ unknown T e/a T=t 
Aly: w = mo vs. Hy: u > mo n Z lan-1 
T Ss -t, n-1 
Aly: y = mo vs. Hy: u < mo 
Gaussian random variable — 1)? 2 s y? 
Hy: 0? = ob vs. Hi: o? # og 2 (n—1)On X EKTn 
p unknown x=? 
00 or 
2 2 
X Z X aln- 
Qa 8 2 2 2 > 72 
Ay: o4 = ap vs. Hı: 0° > 06 X = Xan-1 
2 2 
Hy: 0? = ovs. Hy: 0? < 0% X S X1-a,n-1 


Ho: fx(x | Ho) 
Hi: fx(x| H1) 
and we assume that we know that Hp occurs with probability pọ and H; with probabil- 


ity pı = 1 — po. There are four possible outcomes of the hypothesis test, and we assign 
a cost to each outcome as a measure of its relative importance: 


1. Hp true and decide Hp Cost = Coo 
2. Ho true and decide H, (Type I error) Cost = Co; 
3. H; true and decide Hy (Type II error) Cost = Cio 
4. H; true and decide H, Cost = Cy, 


It is reasonable to assume that the cost of a correct decision is less than that of an 
erroneous decision, that is Coo < Co, and Cy; < Cio. Our objective is to find the 
decision rule that minimizes the average cost C: 


C= CooP| decide Hy | Ho] po F Co, P| decide A, | Ho] po 
+ CioP[ decide Ho | H] pı + Ci,P[ decide A, | A, |p, x (8.88) 


Each time we carry out this hypothesis test we can imagine that the following random 
experiment is performed. The parameter © is selected at random from the set {0, 1} with 
probabilities pọ and p, = 1 — po. The value of © determines which hypothesis is true. We 
cannot observe © directly, but we can collect the random sample X,, = (X1, X2,..., Xn) 
in which the observations are distributed as per the true hypothesis. Let R correspond 
to the subset of the observation space that is mapped into the value 1 (decide H,). R 
corresponds to the rejection region in the previous section. Similarly, let R° correspond 
to the subset that is mapped into the value 0 (decide Ho). The following theorem iden- 
tifies the decision rule that minimizes the average cost. 
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Theorem Minimum Cost Hypothesis Test 


The decision rule that minimizes the average cost is given by: 


ž fx(x| H Po(Cou — C 
Accept Ho if xe R° = [xao = x( 1) aes o) 


< 
fx(x| Ho)  Pi(Cio — Cur) 


5 x| H Ca — C 
Accept H; if xe R = [xao = Ful 1) = Po(Car =) (8.89) 
fx(x| Ho) Pi(Cio — Cu) 
if X is a continuous random variable, and by 
= px(x| HM) po(Cor — C 
Accept Hy if xe RS = [xao = x( 1) < o(Cor o) 
px(x| Ho)  Pa(Cio = Cn) 
= px(xl Hy) po(Cor — C 
Accept H; if xe R = [xiao = xt 1) = ee w) (8.90) 
px(x| Ho) — Pi(Cio = Cn) 


if X is a discrete random variable. 
We will prove the theorem at the end of the section. 


We already encountered A(x), the likelihood ratio function, in our discussion 
of the Neyman-Pearson rule. The above decision rules are of threshold type and can 
involve the likelihood ratio function or the log likelihood ratio function: 


> > 

A(x) = fa(xl H) K or In A(x) = nex) Ink 
fx(x| Ho) < fx(x| Ho) < 
Ho 0 


Example 8.31 Binary Communications 


A binary transmission system accepts a binary input © from an information source. The transmit- 
ter sends a —1 or +1 signal according to whether © = 0 or © = 1. The received signal is equal to 
the transmitted signal plus a Gaussian noise voltage that has zero mean and unit variance. Sup- 
pose that each information bit is transmitted n times. Find a decision rule for the receiver that 
minimizes the probability of error. 

An error occurs if ®© = 0 and we decide 1, or if © = 1 and we decide 0. If we let 
Coo = Cy, = 0 and Co, = Cio = 1, then the average cost is the probability of error: 


C = P[|decide H,| Ho]po + P[decide H| H,]p; = P[error]. 


Each channel output is a Gaussian random variable with mean given by the input signal and 
unit variance. Each input signal is transmitted n times and we assume that the noise values are 
independent. The pdf’s of the n observations are given by: 
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1 
WV date (HALP + (eH PH F(a F122 
1 


=((x1-1)7+ (22-1)? + +(x, 1))/2_ 


Ho 
which reduces to: 
Hı 
5; > 1, Po 
# ——In— = y. 
< 2n Pi 
Ho 


It is interesting to see how the decision threshold y varies with the a priori probabilities and 
the number of transmissions. If the inputs are equiprobable, then pọ = pı and the threshold is 
always zero. However, if we know 1’s are much more frequent, i.e., pi >> po, then the threshold 
y decreases, thereby expanding the rejection region R = {X,, > y}. Thus this a priori knowl- 
edge biases the decision mechanism in favor of H;. As we increase the number of transmissions 
n, the information from the observations becomes more important than the a priori knowledge. 
This effect is evident in the decrease of y to zero as n is increased. 


Example 8.32 MAP Receiver for Binary Communications 


The Maximum A Posteriori (MAP) receiver selects the input that has the larger a posteriori 
probability given the observed output. The MAP receiver uses the following decision rule: 


Accept Ho if fx(x| Hy) p1 < fx(x| Ho) Po 
Accept H; otherwise. (8.91) 


The receiver in the previous example is the MAP receiver. To see this, note that the likelihood 
function and threshold are: 


A, 
A(x) fx(x| Ay) > po(Cor — Coo) po 
x) = = > 
fx(x| Ho) < Pi(Cio- Cu) Pi 

Ho 
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which is equivalent to 


A, 


= 
fx(x| H,)pı < fx(x| Ho) po. 
Ho 


The decision rule in the previous example minimizes the probability of error. Therefore we con- 
clude that the MAP receiver minimizes the probability of error. 


Example 8.33 Server Allocation 


Jobs arrive at a service station at rate ag jobs per minute or rate a; = 2æọ jobs per minute. A 
supervisor counts the number of arrivals in the first minute to decide which arrival rate is present, 
and based on that count decides whether to allocate one processor or two processors to the service 
station. Find a minimum cost rule for this problem. 

We assume that the number of arrivals is a Poisson random variable with one of the two 
means, so we are testing the following hypotheses: 


ab, 
Ho: px(k| Ho) = m i 
af a 
Hı: px(k| Ho) = TE L 


Let the costs be given as follows: 


Coo =5 r Coy = 28 r Cio =S and Cy = 2S — 2r, 


where S is the cost of each server and r is a unit of revenue. The term Cj indicates that no revenue 
is earned when the arrival rate is a, and there is only one server. 
The minimum cost test is obtained from the likelihood ratio: 


kg-a k 
A x) = px(k| H1) = aye Vk! _ (2) e7 (%17), 
px(k| Ho)  aķe™l]k! 


The log likelihood ratio is then: 


a 


Ay 
ay > PoS 
In A(x) = en — (a, — ap) i Or <6) 
Ho 
A, , PoS 
E (a; — ao) + Ore S) œa 1 Ta PS 
< In2 n2 In2_p,2r—S) ” 
Ho 


It is interesting to examine how the parameter values affect the threshold. The term poS is 
the average cost when the lower rate is present and contains an extra cost of S due to false 
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alarms. The term p,(2r — S) is the average cost when the higher rate is present and it contains a 
loss in revenue due to not detecting the presence of the higher arrival rate. If the false alarm cost is 
higher than the miss cost, then the threshold y increases, thus expanding the acceptance region. This 
makes sense since we are motivated to have fewer false alarms. Conversely, the rejection region 
expands when the miss cost is higher. 


Proof of Minimum Cost Theorem 


To prove the minimum cost theorem we evaluate the probabilities in Eq. (8.88) by not- 
ing, for example, that P[decide H; | Ho] is the probability that X, is in R when Ho is 
true. Proceeding in such fashion, we obtain: 


C= Cw f fx(x | Ho) po dx + Co fatal Ho) dx 
R° R 


+ Cw f fx(x| Hy) py dx + Cu f fal i). (8.92) 
R° R 
Since R and R“ cover the entire observation space, we have 
fx(x| Hi)dx = 1 - [set dx. 
Re R 
Therefore 


C = Cp -= / fx(x| Ho) ix} + cu f fx(x | Ho) po dx 
R 


R 


at Cini = J pein) ix} + cu f fx(x|H)) p) dx 
R R 


= CooPo + CioP1 + f {(Cor — Coo) x(x | Ho) Po — (Cio — Cu)fx(x| Ai) pı} dx. 
R 


(8.93) 


We can deduce the minimum cost function from Eq. (8.93). The first two terms 
are fixed-cost components. The term inside the brace is the difference of two positive 
terms: 

(Cor — Coo)fx(x| Ho) Po — (Cio — Cu)fx(x| Ai) pı. (8.94) 


We claim that the minimum cost decision rule always selects an observation point x to 
be in R if the above term is negative. By doing so, it minimizes the overall cost. Includ- 
ing in R points x for which the above term is positive would only increase the overall 
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cost and contradict the claim that the cost is minimum. Therefore, the minimum cost 
decision functions selects H4 if 


(Cor — Coo) fx(x|-Ho) po < (Cio — Cur) fx(xl Ai) pi 


and H, otherwise. This is equivalent to the decision rule in the theorem. 


Bayes Estimation 


The framework for hypothesis testing that we described above can also be applied to 
parameter estimation. To estimate a parameter we assume the following situation. We 
suppose that the parameter is a random variable © with a known a priori distribution. 
A random experiment is performed by “nature” to determine the value of © = 0 that 
is present. We cannot observe 6 directly, but we can observe the random sample 
X, = (Xi, X2,..., Xn), which is distributed according to the active value of 60. Our 
objective is to obtain an estimator g(X,,) which minimizes a cost function that depends 
on g(X,,) and 6: 


C = E[C(g(X,,), ®)] = J feces. ©) fx(x 10) fo(@) dx dé. (8.95) 
0 x 


If the cost function is the squared error, C(g(X), ©) = (g(X) — ©), we have the 
mean square estimation problem. In Chapter 6 we showed that the optimum estimator 
is the conditional expected value of © given X,,: E(@|X,,). 

Another cost function of interest is C(g(X),0) = |g(X) — |, for which it can be 
shown that the optimum estimator is the median of the a posteriori pdf fọ(0 |X). A 
third cost function of interest is: 


1 if|g(X)- @|>6 


CUISE RIS t if |g(X) - O| = ô. (2:90) 


This cost function is analogous to the cost function in Example 8.31 in that the cost is always 
equal to 1 except when the estimate is within 6 of the true parameter value 0. It can be 
shown that the best estimator for this cost function is the MAP estimator which maximizes 
the a posteriori probability f(6 |X). We examine these estimators in the Problems. 

We conclude with an estimator discovered by Bayes and which gave birth to the 
approach developed in this section. The approach was quite controversial because the 
use of an a priori distribution leads to two different interpretations of the meaning of 
probability. See [Bulmer, p. 169] for an interesting discussion on this controversy. In prac- 
tice, we do encounter many situations where we have a priori knowledge of the parame- 
ters of interest. In such cases, Bayes’ methods have proved to be very useful. 


Example 8.34 Estimating p in n Bernoulli Trials 


Let I, = (h, b,..., 1) be the outcomes of n Bernoulli trials. Find the Bayes estimator for the 
probability of success p, assuming that p is a random variable that is uniformly distributed in the 
unit interval. Use the squared error cost function. 
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The probability for the sequence of outcomes i), in,..., i, iS: 


PIL, = Ges) p] = pod = py pd — pp) a p= py 
Şi n- Xi; 


J 
= p™ (1 on p) jz — p*(1 = py 


where k is the number of successes in the n trials. The probability of the sequence i4, i5,...,i, 
over all possible values of p is: 


1 


1 
P[I, = (i, i25... in)] = l P[I, = (i, i2... , in) | p]fp(p) dp = f p“ — p)" * dp, 
0 0 
where fp(p) = 1 is the a priori pdf of p. In Problem 8.92, we show that: 


E k\(n — k)! 
| Ka — t)" dt = ae (8.97) 


The a posteriori pdf of p, given the observation į}, i7,..., i,, is then: 
k 1- n-k 1)! 
pel — p) fep) s(n +:I)! 
i ki(n — k)!” 
[ a -or tela VEN 
0 


fr(D its i2,--+s tn) = Jak 


The a posteriori pdf for the parameter p depends on the observations only through the 
total number of heads k. The best estimator for p in the mean square sense is given by the condi- 
tional expected value of p given i,, i2,..., in: 


7 1 Ea f 1 (n+1)! i a 
100) = | PfolPliniseviddp= [pap Pht p)"“ dp 


(n+ 1)! pl. eg (FD! (K+ 1)Mn- k)! 
= tore, pe By: ap k(n- k)!  (n +2)! 
k+1 
= (8.98) 


This estimator differs from the maximum likelihood estimator which we found to be given 
by the relative frequency in Example 8.10. For large n, the two estimators are in agreement 
if k is also large. Problem 8.92 considers the more general case where p has a beta a priori 
distribution. 


TESTING THE FIT OF A DISTRIBUTION TO DATA 


How well does the model fit the data? Suppose you have postulated a probability 
model for some random experiment, and you are now interested in determining how 
well the model fits your experimental data. How do you test this hypothesis? In this 
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FIGURE 8.6 
Histogram of last digit in telephone numbers. 


section we present the chi-square test, which is widely used to determine the goodness 
of fit of a distribution to a set of experimental data. 

The natural first test to carry out is an “eyeball” comparison of the postulated 
pmf, pdf, or cdf and an experimentally determined counterpart. If the outcome of the 
experiment, X, is discrete, then we can compare the relative frequency of outcomes 
with the probability specified by the pmf, as shown in Fig 8.6. If X is continuous, then 
we can partition the real axis into K mutually exclusive intervals and determine the rel- 
ative frequency with which outcomes fall into each interval. These numbers would be 
compared to the probability of X falling in the interval, as shown in Fig 8.7. If the rela- 
tive frequencies and corresponding probabilities are in good agreement, then we have 
established that a good fit exists. 

We now show that the approach outlined above leads to a test involving the 
multinomial distribution. Suppose that there are K intervals. Let p; be the probability 
that X falls in the ith interval. Since the intervals are selected to be a partition of the 
range of X, we have that pi + p + +- + px = 1. Suppose we perform the experi- 
ment n independent times and let N; be the number of times the outcome is in the ith 
interval. Let (N,, No,..., Nx) be the vector of interval counts, then (N,, N2,..., Nx) 
has a multinomial pmf: 


! 
n: K 


P(N, N2,..., Ng) = (m, m,...,ng)] = arto? pk 


ny! m!. 


where nj = Q0andn +m + tnx =n. 


464 Chapter 8 Statistics 


190 

180 ÑH paik 

170 Observed 
160 Expected 
150 

140 


130 
120 
110 
100 


Number of occurrences 
\o 
= 


20 

“4 i 

0 m) 

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >19 


Interval number 


FIGURE 8.7 
Histogram of computer simulation of exponential random variables. 


First we show that the relative frequencies of the interval counts are a maximum 


likelihood estimator for the K — 1 independent parameters pı, p2,..., Px—1. Note 
that px is determined by the other K — 1 probabilities. Suppose we perform the ex- 
periment n times and observe a sequence of outcomes with counts (1, n2,..., Nng). 


The likelihood of this sequence is: 
nk 


PIN = (my, m,..., ng) | Pi, P2»---, PK-1] = Dip... DK 


and the log likelihood is: 


K 
In PIN = (m, m,..., ng) l| Pi, P2,---, PK-1] = 2, In By. 
= 


We take derivatives with respect to p; and set the result equal to zero: 


Fori = 1,...,K — 1: 


ð K K nj OP; 
L Tup; I 
Ta n pj = z| Sn, ap |- È 


j= 1Pj OD; 


n;0p; ngô ni nx ð = ny n 
zj i Pi K x| =| + K fı So} |= 2-28 
PiðPi = PK OP; Pi PK OP; j=l Pi PK 


(æ) 
II 
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where we have noted that px depends on p;. The above equation implies that 
Di = PKnjnx, which in turn implies that the maximum likelihood estimates must satisfy 


K-1 K-1 n- nk 
Pe=il Sp = 1 Px > nilng = 1 PK . 
i=l i=l NK 


This last equation implies that px = nx/n, and p; = nj/n for i = 1,2,...,K — 1. 
Therefore the relative frequencies of the counts provide maximum likelihood estimates 
for the interval probabilities. As n increases we expect that the relative frequency esti- 
mates will approach the true probabilities. 

We next consider a test statistic that measures the deviation from the expected 
count for each interval, that is, m; = np;. 


K-1 
D? = $ aà(N; — np). 
i=1 

The purpose of the term c; is to ensure that the terms in the sum have good asymptotic 
properties as n becomes large. The choice of c; = 1/np; results in the above sum 
approaching a chi-square distribution with K — 1 degrees of freedom as n becomes 
large. We will not present the proof of this result, which can be found in [Cramer, p. 417]. 
The chi-square goodness-of-fit test involves calculating the D? and using an associated 
significance test. A threshold is selected to provide the desired significance level. The 
chi-square test is performed as follows: 


1. Partition the sample space Sy, into the union of K disjoint intervals. 

2. Compute the probability py that an outcome falls in the kth interval under the as- 
sumption that X has the postulated distribution. Then m, = np, is the expected 
number of outcomes that fall in the kth interval in n repetitions of the experi- 
ment. (To see this, imagine performing Bernoulli trials in which a “success” corre- 
sponds to an outcome in the kth interval.) 

3. The chi-square statistic is defined as the weighted difference between the observed 
number of outcomes, n,, that fall in the kth interval, and the expected number mx: 


—m,) 
D = se om (8.99) 
k=1 


Mk 


4. If the fit is good, then D? will be small. Therefore the hypothesis is rejected if D? is 
too large, that is, if D? = t,, where t is a threshold determined by the significance 
level of the test. 


The chi-square test is based on the fact that for large n, the random variable D? 
has a pdf that is approximately a chi-square pdf with K — 1 degrees of freedom. Thus 
the threshold t, can be computed by finding the point at which 


PLD? > xax-ı] = a, 
where D? is a chi-square random variable with K — 1 degrees of freedom (see Fig. 8.8). 


The thresholds for 1% and 5% significance levels and various degrees of freedom are 
given in Table 8.3. 
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Sx) 


FIGURE 8.8 
Threshold in chi-square test is selected so that P[D? > yyk-1°] = a. 


Example 8.35 


The histogram over the set {0, 1, 2,..., 9} in Fig. 8.6 was obtained by taking the last digit of 114 
telephone numbers in one column in a telephone directory. Are these observations consistent 
with the assumption that they have a discrete uniform pmf? 
If the outcomes are uniformly distributed, then each has probability 1/10. The expected num- 
ber of occurrences of each outcome in 114 trials is 114/10 = 11.4. The chi-square statistic is then 
,_ (7 - 11.4} (16 - 11.4)? 7- 11.4)? 


11.4 11.4 11.4 
= 9.51. 


The number of degrees of freedom is K — 1 = 10 — 1 = 9, so from Table 8.3 the threshold for 
a 1% significance level is 21.7. D? does not exceed the threshold, so we conclude that the data 
are consistent with that of a uniformly distributed random variable. 


Example 8.36 


The histogram in Fig. 8.7 was obtained by generating 1000 samples from a program designed 
to generate exponentially distributed random variables with parameter 1. The histogram was 
obtained by dividing the positive real line into 20 intervals of equal length 0.2. The exact num- 
bers are given in Table 8.6. A second histogram was also taken using 20 intervals of equal 
probability. The numbers for this histogram are given in Table 8.7. 

From Table 8.3 we find that the threshold for a 5% significance level is 30.1. The chi- 
square values for the two histograms are 14.2 and 11.6, respectively. Both histograms pass the 
goodness-of-fit test in this case, but it is apparent that the method of selecting the intervals can 
significantly affect the value of the chi-square measure. 


Example 8.36 shows that there are many ways of selecting the intervals in the 
partition, and that these can yield different results. The following rules of thumb are 
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TABLE 8.6 Chi-square test for exponential random variable, 
equal-length intervals. 


Interval Observed Expected (O- Ey’! E 
0 190 181.3 0.417484 
1 144 148.4 0.130458 
2: 102 121.5 3.129629 
3 96 99.5 0.123115 
4 86 81.44 0.255324 
5 67 66.7 0.001349 
6 59 54.6 0.354578 
7 43 44.7 0.064653 
8 51 36.6 5.665573 
9 28 30 0.133333 
10 28 24.5 0.5 
11 19 20.1 0.060199 
12 15 16.4 0.119512 
13 12 13.5 0.166666 
14 11 11 0 
15 7 9 0.444444 
16 9 7.4 0.345945 
17 5 6 0.166666 
18 8 5 1.8 

>19 20 22.4 0.257142 


Chi-square value = 14.13607 


recommended. First, to the extent possible the intervals should be selected so that they 
are equiprobable. Second, the intervals should be selected so that the expected number 
of outcomes in each interval is five or more. This improves the accuracy of approxi- 
mating the cdf of D? by a chi-square cdf. 

The discussion so far has assumed that the postulated distribution is completely 
specified. In the typical case, however, one or two parameters of the distribution, namely 
the mean and variance, are estimated from the data. It is often recommended that if r of 
the parameters of a cdf are estimated from the data, then D? is better approximated by a 
chi-square distribution with K — r — 1 degrees of freedom. See [Allen, p. 308]. In effect, 
each estimated parameter decreases the degrees of freedom by 1. 


Example 8.37 


The histogram in Table 8.8 was reported by Rutherford, Chadwick, and Ellis in a famous paper 
published in 1920. The number of particles emitted by a radioactive mass in a time period of 
7.5 seconds was counted. A total number of 2608 periods were observed. It is postulated that 
the number of particles emitted in a time period is a random variable with a Poisson distribu- 
tion. Perform the chi-square goodness-of-fit test. 

In this case, the mean of the Poisson distribution is unknown, so it is estimated from the 
data to be 3.870. D? for 12 — 1 — 1 = 10 degrees of freedom is then 12.94. The threshold at a 
1% significance level is 23.2. D? does not exceed this, so we conclude that the data are in good 
agreement with the Poisson distribution. 
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TABLE 8.7 Chi-square test for exponential random variable, 
equiprobable intervals. 


Interval Observed Expected (O — EVIE 
0 49 50 0.02 
1 61 50 2.42 
2 50 50 0 
3 50 50 0 
4 40 50 2 
5 52 50 0.08 
6 48 50 0.08 
7 40 50 2 
8 45 50 0.5 
9 46 50 0.32 
10 50 50 0 
11 51 50 0.02 
12 55 50 0.5 
13 49 50 0.02 
14 54 50 0.32 
15 52 50 0.08 
16 62 50 2.88 
17 46 50 0.32 
18 49 50 0.02 
19 51 50 0.02 


Chi-square value = 11.6 


TABLE 8.8 Chi-square test for Poisson 
random variable. 


Count Observed Expected (O — EIE 


0 76,757.00 54.40 0.12 
1 203.00 210.50 0.27 
2 383.00 407.40 1.46 
3 525.00 525.50 .00 
4 532.00 508.40 1.10 

5 408.00 393.50 .053 
6 273.00 253.80 1.45 
7 139.00 140.30 0.01 
8 45.00 67.80 7.67 
9 27.00 29.20 0.17 
10 10.00 11.30 0.15 
>11 6.00 5.80 0.01 
12.94 


Based on [Cramer, p. 436]. 
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A Statistic is a function of a random sample that consists of n iid observations of 
a random variable of interest. The sampling distribution is the pdf or pmf of the 
statistic. The critical values of a given statistic are the interval endpoints at which 
the complementary cdf achieves certain probabilities. 


A point estimator is unbiased if its expected value equals the true value of the pa- 
rameter of interest, and it is consistent if it is asymptotically unbiased. The mean 
square error of an estimator is a measure of its accuracy. The sample mean and 
the sample variance are consistent estimators. 


Maximum likelihood estimators are obtained by working with the likelihood and 
log likelihood functions. Maximum likelihood estimators are consistent and their 
estimation error is asymptotically Gaussian and efficient. 


The Cramer-Rao inequality provides a way of determining whether an unbiased 
estimator achieves the minimum mean square error. An estimator that achieves 
the lower bound is said to be efficient. 


Confidence intervals provide an interval that is determined from observed data 
and that by design contains a parameter interest with a specified probability 
level. We developed confidence intervals for binomial, Gaussian, Student’s t, and 
chi-square sampling distributions. 


When the number of samples n is large, the central limit theorem allows us to use 
estimators and confidence intervals for Gaussian random variables even if the 
random variable of interest is not Gaussian. 


The sample mean and sample variance for independent Gaussian random variables 
are independent random variables. The chi-square and Student’s f-distribution are 
derived from statistics involving Gaussian random variables. 


A significance test is used to determine whether observed data are consistent 
with a hypothesized distribution. The level of significance of a test is the proba- 
bility that the hypothesis is rejected when it is actually true. 


A binary hypothesis tests decides between a null hypothesis and an alternative hy- 
pothesis based on observed data. A hypothesis is simple if the associated distribu- 
tion is specified completely. A hypothesis is composite if the associated 
distribution is not specified completely. 


Simple binary hypothesis tests are assessed in terms of their significance level and 
their Type II error probability or, equivalently, their power. The Neyman-Pearson 
test leads to a likelihood ratio test that meets a target Type I error probability 
while maximizing the power of the test. 


Bayesian models are based on the assumption of an a priori distribution for the 
parameters of interest, and they provide an alternative approach to assessing and 
deriving estimators and hypothesis tests. 


The chi-square distribution provides a significance test for the fit of observed 
data to a hypothetical distribution. 
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CHECKLIST OF IMPORTANT TERMS 


Acceptance region 
Alternative hypothesis 
Bayes decision rule 
Bayes estimator 
Chi-square goodness-of-fit test 
Composite hypothesis 
Confidence interval 
Confidence level 
Consistent estimator 
Cramer-Rao inequality 
Critical region 

Critical value 

Decision rule 

Efficiency 

False alarm probability 
Fisher information 
Invariance property 
Likelihood function 
Likelihood ratio function 
Log likelihood function 
Maximum likelihood method 
Maximum likelihood test 


ANNOTATED REFERENCES 


Mean square estimation error 
Method of batch means 
Neyman-Pearson test 
Normal random variable 
Null hypothesis 

Point estimator 
Population 

Power 

Probability of detection 
Random sample 
Rejection region 
Sampling distribution 
Score function 
Significance level 
Significance test 

Simple hypothesis 
Statistic 

Strongly consistent estimator 
Type I error 

Type II error 

Unbiased estimator 


Bulmer [1] is a classic introductory textbook on statistics. Ross [2] and Wackerly [3] 
provide excellent and up-to-date introductions to statistics. Bickel [4] provides a more 
advanced treatment. Cramer [5] is a classic text that provides careful development of 
many traditional statistical methods. Van Trees [6] has influenced the application of sta- 
tistical methods in modern communications. [10] provides a very useful online resource 
for learning probability and statistics. 
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PROBLEMS 


Prentice Hall, Englewood Cliffs, N.J., 1984. 


Note: Statistics involves working with data. For this reason the problems in this section incor- 
porate exercises that involve the generation of random samples of random variables using the 
methods introduced in Chapters 3, 4, 5, and 6. These exercises can be skipped without loss of 
continuity. 


Section 8.1: Samples and Sampling Distributions 


8.1. Let X be a Gaussian random variable with mean 10 and variance 4. A sample of size 9 is 
obtained and the sample mean, minimum, and maximum of the sample are calculated. 


8.2. 


8.3. 


8.4. 


(a) 
(b) 
(c) 
(d) 
(e) 


Find the probability that the sample mean is less than 9. 

Find the probability that the minimum is greater than 8. 

Find the probability that the maximum is less than 12. 

Find n so the sample mean is within 1 of the true mean with probability 0.95. 


Generate 100 random samples of size 9. Compare the probabilities obtained in parts 
a, b, and c to the observed relative frequencies. 


The lifetime of a device is an exponential random variable with mean 50 months. A sam- 
ple of size 25 is obtained and the sample mean, maximum, and minimum of the sample 
are calculated. 


(a) 
(b) 
(c) 
(d) 


(e) 


Estimate the probability that the sample mean differs from the true mean by more 
than 1 month. 


Find the probability that the longest-lived sample is greater than 100 months. 

Find the probability that the shortest-lived sample is less than 25 months. 

Find n so the sample mean is within 5 months of the true mean with probability 
0.9. 

Generate 100 random samples of size 25. Compare the probabilities obtained in 
parts a, b, and c to the observed relative frequencies. 


Let the signal X be a uniform random variable in the interval | —3, 3], and suppose that a 
sample of size 50 is obtained. 


(a) 
(b) 
(c) 


(d) 


Estimate the probability that the sample mean is outside the interval [ —0.5, 0.5]. 
Estimate the probability that the maximum of the sample is less than 2.5. 

Estimate the probability that the sample mean of the squares of the samples is 
greater than 3. 

Generate 100 random samples of size 50. Compare the probabilities obtained in 
parts a, b, and c to the observed relative frequencies. 


Let X be a Poisson random variable with mean a = 2, and suppose that a sample of size 
16 is obtained. 


(a) 
(b) 


Estimate the probability that the sample mean is greater than 2.5. 


Estimate the probability that the sample mean differs from the true mean by more 
than 0.5. 
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8.5. 


8.6. 


8.7. 


8.8. 


8.9. 


8.10. 
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(c) Find n so the sample mean differs from the true mean by more than 0.5 with 
probability 0.95. 

(d) Generate 100 random samples of size 16. Compare the probabilities obtained in 
parts a and b to the observed relative frequencies. 

The interarrival time of queries at a call center are exponential random variables with 

mean interarrival time 1/4. Suppose that a sample of size 9 is obtained. 


(a) The estimator Âi = 1/X is used to estimate the arrival rate. Find the probability that 
the estimator differs from the true arrival rate by more than 1. 

(b) Suppose the estimator we = 1/9 min(X1,..., Xo) is used to estimate the arrival 
rate. Find the probability that the estimator differs from the true arrival rate by 
more than 1. 

(c) Generate 100 random samples of size 9. Compare the probabilities obtained in parts 
a and b to the observed relative frequencies. 


Let the sample X1, X2,..., Xn consist of lid versions of the random variable X. The 
method of moments involves estimating the moments of X as follows: 

k Ta ok 

mMk = 72x; 


(a) Suppose that X is a uniform random variable in the interval [0, 0]. Use m, to find an 
estimator for 0. 


(b) Find the mean and variance of the estimator in part a. 

Let X be a gamma random variable with parameters a and B = 1/A. 

(a) Use the first two moment estimators m, and Mn of X (defined in Problem 8.6) to es- 
timate the parameters a and £. 

(b) Describe the behavior of the estimators in part a as n becomes large. 

Let X = (X,Y) be a pair of random variables with known means, u; and u2. Consider 

the following estimator for the covariance of X and Y: 


n 


^ 1 
Cay = aa m)(Y; — m). 
= 


(a) Find the expected value and variance of this estimator. 
(b) Explain the behavior of the estimator as n becomes large. 


Let X = (X,Y) be a pair of random variables with unknown means and covariances. 
Consider the following estimator for the covariance of X and Y: 


R 1 ka 
Kxy = meee eS z Xa (YX; E Y,„). 
n j=l 


(a) Find the expected value of this estimator. 

(b) Explain why the estimator approaches the estimator in Problem 8.8 for n large. Hint: 
See Eq. (8.15). 

Let the sample X1, X2,..., X, consist of iid versions of the random variable X. Consider 

the maximum and minimum statistics for the sample: 


W = min(X),...,X,) and Z = max(X),..., Xn) 
(a) Show that the pdf of Z is fz(x) = n[Fy(x)]""! fy(x). 
(b) Show that the pdf of Wis fy(x) = n[1 — Fy(x)]""! fx(x). 
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Section 8.2: Parameter Estimation 


8.11. 
8.12. 


8.13. 


8.14. 


8.15. 


8.16. 


8.17. 


Show that the mean square estimation error satisfies E (ô — 0)}?] = VAR[ ô] + B( ô Je 


Let the sample X,, X2, X3, X4 consist of iid versions of a Poisson random variable X with 
mean a = 4. Find the mean and variance of the following estimators for a and determine 
whether they are biased or unbiased. 


(a) a = (Xi + X2)/2. 
(b) & = (X; + X4)/2. 
(c) a3 = (X, + 2X2)/⁄3. 
(d) ay = (Xi + X, + X; + X4)/4. 


(a) Let Ô 1 and 0, be unbiased estimators for the parameter 6. Show that the estimator 


6 = pO, + +(1- p)9> is also an unbiased estimator for 6, where 0 = p = 1. 

(b) Find the value of p in part a that minimizes the mean square error. 

(c) Find the value of p that minimizes the mean square error if Ô; and 0, are the esti- 
mators in Problems 8.12a and 8.12b. 

(d) Repeat part c for the estimators in Problems 8.12a and 8.12d. 

(e) Let OA and 0, be unbiased estimators for the first and second moments of X. Find 
an estimator for the variance of X. Is it biased? 

The output of a communication system is Y = 0 + N, where 0 is an input signal and N is 


a noise signal that is uniformly distributed in the interval [0, 2]. Suppose the signal is 
transmitted n times and that the noise terms are iid random variables. 

(a) Show that the sample mean of the outputs is a biased estimator for 6. 

(b) Find the mean square error of the estimator. 

The number of requests at a Web server is a Poisson random variable X with mean a = 2 
requests per minute. Suppose that  1-minute intervals are observed and that the number 
Nọ of intervals with zero arrivals is counted. The probability of zero arrivals is then esti- 
mated by py = No/n. To estimate the arrival rate a, p is set equal to the probability of 
zero arrivals in one minute: 


Po = No/n = P[X = 0] etse". 


(a) Solve the above equation for â to obtain an estimator for the arrival rate. 
(b) Show that a is biased. 

(c) Find the mean square error of å. 

(d) Is @ a consistent estimator? 

Generate 100 samples size 20 of the Poisson random variables in Problem 8.15. 


(a) Estimate the arrival rate œ using the sample mean estimator and the estimator from 
Problem 8.15. 


(b) Compare the bias and mean square error of the two estimators. 
To estimate the variance of a Bernoulli random variable X, we perform n iid trials and 
count the number of successes k and obtain the estimate p = k/n. We then estimate the 


variance of X by 
A 3 x k k 
o =p- p) = “(1 ). 


(a) Show that ô? is a biased estimator for the variance of X. 


(b) Is ĉa consistent estimator for the variance of X? 
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8.18. 


8.19. 


8.20. 


8.21. 
8.22. 


8.23. 
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(c) Find a constant c, so that c6? is an unbiased estimator for the variance of X. 
(d) Find the mean square errors of the estimators in parts b and c. 


Let X,, X>,..., Xn be a random sample of a uniform random variable that is uniformly 
distributed in the interval [0, 0]. Consider the following estimator for 6: 


6 = max{X 1, X>,..., Xn} 
(a) Find the pdf of ô using the results of Problem 8.10. 
(b) Show that @ is a biased estimator. 
(c) Find the variance of © and determine whether it is a consistent estimator. 


(d) Find a constant c so that cO is an unbiased estimator. 


(e) Generate a random sample of 20 uniform random variables with 0 = 5. Compare 
the values provided by the two estimators in 100 separate trials. 


(f) Generate 1000 samples of the uniform random variable, updating the estimator 
value every 50 samples. Can you discern the bias of the estimator? 


Let X1, X2,..., Xn, be a random sample of a Pareto random variable: 


with k = 2.5. Consider the estimator for 0: 
Ô = min{X), X5,...X,}. 


(a) Show that Ô is a biased estimator and find the bias. 

(b) Find the mean squared error of Ô. 

(c) Determine whether Ô is a consistent estimator. 

(d) Use Octave to generate 1000 samples of the Pareto random variable. Update the 
estimator value every 50 samples. Can you discern the bias of the estimator? 

(e) Repeat part d with k = 1.5. What changes? 

Generate 100 samples of sizes 5, 10, 20 of exponential random variables with mean 1. 

Compare the histograms of the estimates given by the biased and unbiased estimators for 

the sample variance. 

Find the variance of the sample variance estimator in Example 8.8. Hint: Assume m = 0. 

Generate 100 samples of size 20 of pairs of zero-mean, unit-variance jointly Gaussian 

random variables with correlation coefficient p = 0.50. Compare the histograms of the 

estimates given by the estimators for the sample covariance in Problems 8.8 and 8.9. 

Repeat the scenario in Problem 8.22 for the following estimator for the correlation coef- 

ficient between two random variables X and Y: 


2 (X; ~ XX; oa Yn) 
A J7 
Pry ~ 7 


SX - X, Sy, - ¥,? 


j=1 
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Let X be an exponential random variable with mean 1/A. 
(a) Find the maximum likelihood estimator © yz for 0 = 1/A. 


(b) Find the maximum likelihood estimator ô mL for 0 = A. 
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(c) Find the pdfs of the estimators in part a. 
(d) Is the estimator in part a unbiased and consistent? 


(e) Repeat 20 trials of the following experiment: Generate a sample of 16 observa- 
tions of the exponential random variable with A = 1/2 and find the values given 
by the estimators in parts a and b. Show a histogram of the values produced by 
the estimators. 


Let X = 6 + N be the output of a noisy channel where the input is the parameter 6 and 
N is a zero-mean, unit-variance Gaussian random variable. Suppose that the output is 
measured n times to obtain the random sample X; = 6 + N; fori = 1,...,7. 


(a) Find the maximum likelihood estimator © yz for 6. 
(b) Find the pdf of Oy. 
(c) Determine whether © mz is unbiased and consistent. 


Show that the maximum likelihood estimator for a uniform random variable that is dis- 
tributed in the interval [0,0] is © = max{ X, X5,..., Xa}. Hint: You will need to show 
that the maximum occurs at an endpoint of the interval of parameter values. 


Let X be a Pareto random variable with parameters a and xm. 
(a) Find the maximum likelihood estimator for a assuming x,, is known. 
(b) Show that the maximum likelihood estimators for a and x,, are: 


n XxX. =l 
aur = È oo x : J and Xm ML Sa min(X,,X>,. T Xn). 
j=1 Xm, ML 


(c) Discuss the behavior of the estimators in parts a and b as n becomes large and de- 
termine whether they are consistent. 


(d) Repeat five trials of the following experiment: Generate a sample of 100 observa- 
tions of the Pareto random variable with a = 2.5 and x„ = 1 and obtain the values 
given by the estimators in part b. Repeat for a = 1.5 and x„ = 1, and a = 0.5 and 
Xm = 1. 

(a) Show that the maximum likelihood estimator for the parameter 0 = a of the 


Rayleigh random variable is 


1 
a2 2 
QML on A J 
(b) Is the estimator is unbiased? 


(c) Repeat 50 trials of the following experiment: Generate a sample of 16 observations 
of the Rayleigh random variable with a = 2 and find the values given by the estima- 
tor in part a. Show a histogram of the values produced by the estimator. 


(a) Show that the maximum likelihood estimator for 0 = a of the beta random variable 


with b = 1 is 
1 -i 
âm, =|=- log X; |. 
aML ne og íj 


(b) Generate a sample of 100 observations of the beta random variable with b = 1 and 
a = 0.5 to obtain the estimate for a. Repeat for a = 1,a = 2, anda = 3. 


Let X be a Weibull random variable with parameters a and £ (see Eq. 4.102). 


(a) Assuming that 6 is known, show that the maximum likelihood estimator for 6 = a is: 


n 12 mi 
aML — nate k 
j= 
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(b) Generate a sample of 100 observations of the Weibull random variable with a = 1 
and B = 1 to obtain the estimate for a. Repeat for B = 2 and B = 4. 

A certain device is known to have an exponential lifetime. 

(a) Suppose that n devices are tested for T seconds, and the number of devices that fail 
within the testing period is counted. Find the maximum likelihood estimator for the 
mean lifetime of the device. Hint: Use the invariance property. 


(b) Repeat ten trials of the following experiment: Generate a sample of 16 observations 
of the exponential random variable with A = 1/10 and testing period T = 15. Find 
the estimates for the mean lifetime using the method in part a and compare these 
with the estimates provided by Problem 8.24a. 

Let X be a gamma random variable with parameters a and À. 

(a) Find the maximum likelihood estimator À mL for A assuming a is known. 

(b) Find the maximum likelihood estimators ay; and À mL for a and à. Assume that the 
function ’(@)/['(a@) is known. 

Let X = (X,Y) be a jointly Gaussian random vector with zero means, unit variances, 

and unknown correlation coefficient p. Consider a random sample of n such vectors. 

(a) Show that the ML estimator for p. involves solving a cubic eqation. 

(b) Show that Problem 8.23 gives the ML estimator if the mean and variances are unknown. 

(c) Repeat 5 trials of the following: Generate a sample of 100 observations of the pairs 
of zero-mean, unit-variance Gaussian random variables and estimate p. using parts a 
and b for the cases: p = 0.5, p = 0.9, and p = 0. 


(Invariance Property.) Let © yz be the maximum likelihood estimator for the parameter 
0 of X. Suppose that we are interested instead in finding the maximum likelihood estima- 
tor for h(@), which is an invertible function of 60. Explain why this maximum likelihood 
estimator is given by h( Ow). 

Show that the Fisher information is also given by Eq. (8.36). Assume that the first two 
partial derivatives of the likelihood function exist and that they are absolutely integrable 
so that differentiation and integration with respect to 0 can be interchanged. 


Show that the following random variables have the given Cramer-Rao lower bound and 
determine whether the associated maximum likelihood estimator is efficient: 


(a) Binomial with parameters n and unknown p: p(1 — p)/n’. 
(b) Gaussian with known variance g? and unknown mean: o7/n. 


(c) Gaussian with unknown variance: 2øf/n. Consider two cases: mean known; mean 
unknown. Does the standard unbiased estimator for the variance achieve the 
Cramer-Rao lower bound? Note that E[(X — w)*] = 30%. 

(d) Gamma with parameters known « and unknown 8 = 1/A: B7/na. 

(e) Poisson with parameter unknown a: a/n. 

Let Ô mL be the maximum likelihood estimator for the mean of an exponential random 

variable. Suppose we estimate the variance of this exponential random variable using the 

estimator Cae What is the probability that O24, is within 5% of the true value of the 
variance? Assume that the number of samples is large. 

Let Ô mL be the maximum likelihood estimator for the mean a of a Poisson random vari- 

able. Suppose we estimate the probability of no arrivals P[ X = 0] = e “ with the estimator 


e`? mL, Find the probability that this estimator is within 10% of the true value of P[ X = 0]. 
Assume that the number of samples is large. 
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A voltage measurement consists of the sum of a constant unknown voltage and a Gauss- 
ian-distributed noise voltage of zero mean and variance 10 uV?. Thirty independent mea- 
surements are made and a sample mean of 100 uV is obtained. Find the corresponding 
95% confidence interval. 


Let X; be a Gaussian random variable with unknown mean E[ X] = u and variance 1. 
(a) Find the width of the 95% confidence intervals for u for n = 4, 16, 100. 
(b) Repeat for 99% confidence intervals. 


The lifetime of 225 light bulbs is measured and the sample mean and sample variance are 
found to be 223 hr and 100 hr, respectively. 

(a) Find a 95% confidence interval for the mean lifetime. 

(b) Find a 95% confidence interval for the variance of the lifetime. 

Let X be a Gaussian random variable with unknown mean and unknown variance. A set 
of 10 independent measurements of X yields 


10 10 
— 2. = 
2x; =350 and 2x = 12,645. 


(a) Find a90% confidence interval for the mean of X. 


(b) Find a90% confidence interval for the variance of X. 


Let X be a Gaussian random variable with unknown mean and unknown variance. A set of 10 
independent measurements of X yields a sample mean of 57.3 and a sample variance of 23.2. 


(a) Find the 90%,95%, and 99% confidence intervals for the mean. 


(b) Repeat part a if a set of 20 measurements had yielded the above sample mean and 
sample variance. 


(c) Find the 90%,95%, and 99% confidence intervals for the variance in parts a and b. 


A computer simulation program is used to produce 150 samples of a random variable. 
The samples are grouped into 15 batches of ten samples each. The batch sample means 
are listed below: 


0.228 —1.941 0.141 1.979 —0.224 
0.501 —5.907 1.367 1.615 —1.013 
0.397 —3.360 3.330 0.033 —0.976 


(a) Find the 90% confidence interval for the sample mean. 

(b) Repeat this experiment by generating beta random variables with parameters a = 2 
and B = 3. 

(c) Repeat part b using gamma random variables with A = 1 anda = 2. 

(d) Repeat part b using Pareto random variables with x,, = 1 and a = 3; x,, = 1 and 
a= 15. 

A coin is flipped a total of 500 times, in 10 batches of 50 flips each. The number of heads 

in each of the batches is as follows: 


24, 27, 22, 24, 25, 24, 28, 26, 23, 26. 
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(a) Find the 95% confidence interval for the probability of heads p using the method of 
batch means. 


(b) Simulate this experiment by generating Bernoulli random variables with p = 0.25; 
p = 0.01. 


This exercise is intended to check the statement: “If we were to compute confidence 

intervals a large number of times, we would find that approximately (1 — a) X 100% 

of the time, the computed intervals would contain the true value of the parameter.” 

(a) Assuming that the mean is unknown and that the variance is known, find the 90% 
confidence interval for the mean of a Gaussian random variable with n = 10. 

(b) Generate 500 batches of 10 zero-mean, unit-variance Gaussian random variables, 
and determine the associated confidence intervals. Find the proportion of confi- 
dence intervals that include the true mean (which by design is zero). Is this in agree- 
ment with the confidence level 1 — a = .90? 

(c) Repeat part b using exponential random variables with mean one. Should the pro- 
portion of intervals including the true mean be given by 1 — a? Explain. 


Generate 160 X; that are uniformly distributed in the interval [—1, 1]. 


(a) Suppose that 90% confidence intervals for the mean are to be produced. Find the 
confidence intervals for the mean using the following combinations: 


4 batches of 40 samples each, 
8 batches of 20 samples each, 
16 batches of 10 samples each, and 
32 batches of 5 samples each. 


(b) Redo the experiment in part a 500 times. In each repetition of the experiment, com- 
pute the four confidence intervals defined in part a. Calculate the proportion of time 
in which the above four confidence intervals include the true mean. Which of the 
above combinations of the batch size and number of batches are in better agreement 
with the results predicted by the confidence level? Explain why. 

This exercise explores the behavior of confidence intervals as the number of samples is 

increased. Generate 1000 samples of independent Gaussian random variables with mean 

25 and variance 36. Update and plot the confidence intervals for the mean and variance 

every 50 samples. 
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A new Web page design is intended to increase the rate at which customers place orders. 

Prior to the new design, the number of orders in an hour was a Poisson random variable 

with mean 30. Eight one-hour measurements with the new design find an average of 32 

orders completed per hour. 

(a) Ata5% significance level, do the data support the claim that the order placement 
rate has increased? 


(b) Repeat part a at a 1% significance level. 

Carlos and Michael play a game where each flips a coin once: If the outcomes of the tosses 

are the same, then no one wins; but if the outcome is different the player with “heads” wins. 

Michael uses a fair coin but he suspects that Carlos is using a biased coin. 

(a) Find a 10% significance level test for an experiment that counts how many times Car- 
los wins in 6 games to test whether Carlos is cheating. Repeat for n = 12 games. 

(b) Now design a 10% significance level test based on the number of times Carlos, 
tosses come up heads. Which test is more effective? 

(c) Find the probability of detection if Carlos uses a coin with p = 0.75; p = 0.55. 
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The output of a receiver is the sum of the input voltage and a Gaussian random vari- 

able with zero mean and variance 4 volt”. A scientist suspects that the receiver input is 

not properly calibrated and has a nonzero input voltage in the absence of a true input 

signal. 

(a) Find a 1% significance level test involving n independent measurements of the out- 
put to test the scientist’s hunch. 


(b) What is the outcome of the test if 10 measurements yield a sample mean of —0.75 volts? 


(c) Find the probability of a Type II error if there is indeed an input voltage of 1 volt; of 
10 millivolts. 


(a) Explain the relationship between the p-value and the significance level a of a test. 


(b) Explain why the p-value provides more information about the test statistic than sim- 
ply stating the outcome of the hypothesis test. 


(c) How should the p-value be calculated in a one-sided test? 
(d) How should the p-value be calculated in a two-sided test? 


The number of photons counted by an optical detector is a Poisson random variable with 
known mean a in the absence of a target and known mean B = 6 > a = 2 when a target 
is present. Let the null hypothesis correspond to “no target present.” 

(a) Use the Neyman-Pearson method to find a hypothesis test where the false alarm 
probability is set to 5%. 

(b) What is the probability of detection? 

(c) Suppose that n independent measurements of the input are taken. Use trial and 
error to find the value of n required to achieve a false alarm probability of 5% anda 
probability of detection of 90%. 

The breaking strength of plastic bags is a Gaussian random variable. Bags from company 1 

have a mean strength of 8 kilograms and a variance of 1 kg”; bags from company 2 have a 

mean strength of 9 kilograms and a variance of 1 kg”. We are interested in determining 

whether a batch of bags comes from company 1 (null hypothesis). Find a hypothesis test 
and determine the number of bags that needs to be tested so that a is 1% and the proba- 

bility of detection is 99%. 

Light Internet users have session times that are exponentially distributed with mean 2 

hours, and heavy Internet users have session times that are exponentially distributed with 

mean 4 hours. 

(a) Use the Neyman-Pearson method to find a hypothesis test to determine whether a 
given user is a light user. Design the test for a = 5%. 


(b) What is the probability of detecting heavy users? 

Normal Internet users have session times that are Pareto distributed with mean 3 hours 

and a = 3, and heavy peer-to-peer users have session times that are Pareto distributed 

with a = 8/7 and mean 16 hours. 

(a) Use the Neyman-Pearson method to find a hypothesis test to determine whether a 
given user is a normal user. Design the test fora = 1% 


(b) What is the probability of detecting heavy peer-to-peer users? 

Coin factories A and B produce coins for which the probability of heads p is a beta- 

distributed random variable. Factory A has parameters a = b = 10, and factory B has 

a=b=5. 

(a) Design a hypothesis test for a = 5% to determine whether a batch is from factory A. 

(b) What is the probability of detecting factory B coins? Hint: Use the Octave function 
beta_inv. Assume that the probability of heads in the batch can be determined 
accurately. 
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When operating correctly (null hypothesis), wires from a production line have a mean diam- 
eter of 2 mm, but under a certain fault condition the wires have a mean diameter of 1.75 mm. 
The diameters are Gaussian distributed with variance .04 mm”. A batch of 10 sample wires 
is selected and the sample mean is found to be 1.82 mm. 


(a) Design a test to determine whether the line is operating correctly. Assume a false 
alarm probability of 5%. 


(b) What is the probability of detecting the fault condition? 

(c) What is the p-value for the above observation? 

Coin 1 is fair and coin 2 has probability of heads 3/4. A test involves flipping a coin repeatedly 
until the first occurrence of heads. The number of tosses is observed. 


(a) Can you design a test to determine whether the fair coin is in use? Assume a = 5%. 
What is the probability of detecting the biased coin? 


(b) Repeat part a if the biased coin has probability 1/4. 


The output of a radio signal detection system is the sum of an input voltage and a zero- 
mean, unit-variance Gaussian random variable. 


(a) Design a hypothesis test, at a significance level a = 10%, to determine whether 
there is a nonzero input assuming n independent measurements of the receiver out- 
put (so the additive noise terms are iid random variables). 


(b) Find expressions for the Type II error probability and the power of the test in 
part a. 
(c) Plot the power of the test in part a as the input voltage varies from —0°° to +00 for 
n = 4, 16, 64, 256. 
(a) In Problem 8.60, design a hypothesis test, at a significance level a, to determine 
whether there is a positive input assuming n independent measurements. 
(b) Find expressions for the Type II error probability and the power of the test in part a. 
(c) Plot the power of the test in part a as the input voltage varies from —0°° to +00 for 
n = 4, 16, 64, 256. 
Compare the power curves obtained in Problems 8.60 and 8.61. Explain why the test in 
Problem 8.61 is uniformly most powerful, while the test in Problem 8.60 is not. 
Consider Example 8.27 where we considered 
Ho: X is Gaussian with u = Oanda% = 1 
H,: X is Gaussian with u > 0 and o% = 1. 
Letn = 25,a = 5%. Foru = k/2,k = 0,1,2,...,5 perform the following experiment: 
Generate 500 batches of size 25 of the Gaussian random variable with mean u and unit 
variance. For each batch determine whether the hypothesis test accepts or rejects the 


null hypothesis. Count the number of Type I errors and Type II errors. Plot the empiri- 
cally obtained power function as a function of u. 


Repeat Problem 8.63 for the following hypothesis test: 
Hp: X is Gaussian with u = 0 ando% = 1 
H,: X is Gaussian with u # O and o% = 1. 
Letn = 25, a = 5%, and run the experiments for u = +k/2,k = 0,1,2,...,5. 
Consider the following three tests for a fair coin: 
(i) Ho: p = 0.5 vs. Hy: p # 0.5 
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(ii) Ho: p = 0.5 vs. Hy: p > 0.5 

(iii) Ho: p = 0.5 vs. Hy: p < 0.5. 

Assume n = 100 coin tosses in each test and that the rejection regions for the above tests 
are selected for a = 1%. 

(a) Find the power curves for the three tests as a function of p. 

(b) Explain the power curve of the two-sided test in comparison to those of the one- 


sided tests. 
(a) Consider hypothesis test (i) of Problem 8.65 with a = 5%. For p = k/10, k = 1, 
2,...,9 perform the following experiment: Generate 500 batches of 100 tosses of a 


coin with probability of heads p. For each batch determine whether the hypothesis 
test accepts or rejects the null hypothesis. Count the number of Type I errors and 
Type II errors. Plot the empirically obtained power function as a function of u. 
(b) Repeat part a for hypothesis test (ii) of Problem 8.65. 
Consider the hypothesis test developed in Example 8.26 to test Hp: m = pu vs. H: m > p. 
Suppose we use this test, that is, the associated rejection and acceptance region, for the fol- 
lowing hypothesis testing problem: 


Ho: X is Gaussian with mean m = p and known variance g? 


H,: X is Gaussian with mean m > u and known variance o°. 


Show that the test achieves significance level a or better. Hint: Consider the power func- 
tion of the test in Example 8.26. 


A machine produces disks with mean thickness 2 mm.To test the machine after undergoing 
routine maintenance, 10 sample disks are selected and the sample mean of the thickness is 
found to be 2.2 mm and the sample variance is found to be 0.04 mm”. 


(a) Find a test to determine if the machine is working properly for a = 1%; a = 5%. 
(b) Find the p-value of the observation. 


A manufacturer claims that its new improved tire design increases tire lifetime from 
50,000 km to 55,000 km. A test of 8 tires gives a sample mean lifetime of 52,500 km and a 
sample standard deviation of 3000 km. 


(a) Finda test to determine if the claim can be supported at a level of a = 1%; a = 5%. 

(b) Find the p-value of the observation. 

A class of 100 engineering freshmen is provided with new laptop computers. The manu- 

facturer claims the charge in the batteries will last four hours. The frosh run a test and 

find a sample mean of 3.3 hours and a sample standard deviation of 0.5 hours. 

(a) Find a test to determine if the manufacturer’s claim can be supported at a signifi- 
cance level of a = 1%; a = 5%. 

(b) Find the p-value of the observation. 

Consider the hypothesis test considered in Example 8.29: 


Ho: X is Gaussian with u = 0 and o Ẹ unknown 
Hı: X is Gaussian with u # 0 and o% unknown. 


Let n = 9, a=5%, oy = 1. For u = £k/2, k = 0,1,2,...,5 perform the following 
experiment: Generate 500 batches of size 9 of the Gaussian random variable with mean u 
and unit variance. For each batch determine whether the hypothesis test accepts or rejects 
the null hypothesis. Count the number of Type I errors and Type II errors. Plot the empiri- 
cally obtained power function as a function of u. Compare to the expected results. 
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Repeat Problem 8.71 for the following hypothesis test: 
Ho: X is Gaussian with u = 0 and 0% unknown 
Hi: X is Gaussian with u > 0 and o Ẹ unknown. 


Letn = 9,a = 5%,oy = 1,and u = k/2,k = 0,1,2,...,5. 

Consider using the hypothesis test in Example 8.29 when the random variable is not Gauss- 
ian. Design tests fora = 5%,n = 9 and for n = 25. For u = +k/2,k = 0,1,2,...,5 per- 
form the following experiment: Let X be a uniform random variable in the interval 
[—1/2, 1/2]. Generate 500 batches of size n of the uniform random variable with mean p. 
For each batch determine whether the hypothesis test accepts or rejects the null hypoth- 
esis. Count the number of Type I errors and Type II errors. Plot the empirically obtained 
power function as a function of u. Compare the empirical data to the values expected for 
the Gaussian random variable. 


Consider using the hypothesis test in Problem 8.73 when the random variable is an 

exponential random variable. Design tests for a = 5%, u = 1,n = 9 and for n = 25. 

Repeat the experiment for u = k/2,k = 1,2,...,5. Compare the empirical data to 

the values expected for the Gaussian random variable. 

A stealth alarm system works by sending noise signals: A “situation normal” signal is sent 

by transmitting voltages that are Gaussian iid random variables with mean zero and vari- 

ance 4; an “alarm” signal is sent by transmitting iid Gaussian voltages with mean zero and 

variance less than 4. 

(a) Find a 1% level hypothesis test to determine whether the situation is normal 
(null hypothesis) based on the calculation of the sample variance from n voltage 
samples. 

(b) Find the power of the hypothesis test for n = 8, 64, 256 as the variance of the alarm 
signal is varied. 

Repeat Problem 8.75 if the alarm signal uses iid Gaussian voltages that have variance 

greater than 4. 


A stealth system summons Agent 00111 by sending a sequence of 71 Gaussian iid random 
variables with mean zero and variance mọ = 7. Find a hypothesis test (to be implemented 
in Agent’s 00111 wristwatch) to determine, at a 1% level, that she is being summoned. Plot 
the probability of Type H error. 


Consider the hypothesis test in Example 8.30 for testing the variance: 
Ho: X is Gaussian with o% = 1 and m unknown 


H,: X is Gaussian with 0% # 1 and m unknown. 


Letn = 16,a = 5%, u = 0. For o% = k/3,k = 1,2,...,6 perform the following experi- 
ment: Generate 500 batches of size 16 of the Gaussian random variable with zero mean 
and variance o 4. For each batch determine whether the hypothesis test accepts or rejects 
the null hypothesis. Count the number of Type I errors and Type IJ errors. Plot the power 
function as a function of u. Compare to the expected results. 


Consider using the hypothesis test in Problem 8.78 when the random variable is a uni- 
form random variable. Repeat the experiment where X is now a uniform random vari- 
able in the interval [—1/2, 1/2]. Compare the empirical data to the values expected for 
the Gaussian random variable. Repeat the experiment for n = 9 and n = 36. 
In this exercise we explore the relation between confidence intervals and hypothesis test- 
ing. Consider the hypothesis test in Example 8.28 but with a level of a = 5%. 
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(a) Run 200 trials of the following experiment: Generate 10 samples of X given that Hp 
is true; determine the confidence interval; determine if the interval includes 0; deter- 
mine if the null hypothesis is accepted. 

(b) Is the relative frequency of Type I error as expected? 
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The Premium Pen Factory tests one pen in each batch of 100 pens. The ink-filling ma- 
chine is bipolar, so pens can write continuously for an exponential duration of mean ei- 
ther 1/2 hour or 5 hours. The machine is in the short-life production mode 10% of the 
time. A batch of short-life pens sold as long-life pens results in a loss of $5, while a batch 
of long-life pens mistakenly sold as short-life results in a loss of $3. Find the Bayes deci- 
sion rule to decide whether a batch is long-life or short-life based on the measured life- 
time of the test pen. 

Suppose we send binary information over an erasure channel. If the input to the channel 

is “0”, then the output is equally likely to be “0” or “e” for “erased”; and if the input is “1” 

then the outputs are equally likely to be “1” or “e.” Assume that P[O = 1] = 1/4 

= 1—- P[® = 0], and that the cost functions are: Cog = Cy, = 0 and Co, = bCio. 

(a) For b = 1/6, 1, and 6, find the maximum likelihood decision rule, which picks the 
input that maximizes the likelihood probability for the observed output. Find the av- 
erage cost for each case. 

(b) For the three cases in part a, find the Bayes decision rule that minimizes the average 
cost. Find the average cost for each case. 

For the channel in Problem 8.82, suppose we transmit each input twice. The receiver 

makes its decision based on the observed pair of outputs. Find and compare the maxi- 

mum likelihood and the Bayes’ decision rules. 

When Bob throws a dart the coordinates of the landing point are a Gaussian pair of in- 

dependent random variables (X, Y) with zero mean and variance 1. When Rick throws 

the dart the coordinates are also a Gaussian independent pair but with zero mean and 
variance 4. Bob and Rick are asked to draw a circle centered about the origin with the 
inner disk assigned to Bob and the outer ring assigned to Rick. 

(a) Whenever either player lands on the other player’s area, he must pay a $1 to the 
house. Find the disk radius that minimizes the players’ average cost. 

(b) Repeat part a if Bob must pay $2 when he lands in Rick’s area. 

A binary communications system accepts ©, which is “0” or “1”, as input and outputs X, 

“0” or “1”, with probability of error P[® # X] = p = 10%. Suppose the sender uses a 

repetition code whereby each “0” or “1” is transmitted n independent times, and the re- 

ceiver makes its decision based on the n = 8 corresponding outputs. Assume that 

1/5 = P[® =1]=a=1- P[O =0]. 

(a) Find the maximum likelihood decision rule that selects the input which is more like- 
ly for the given n outputs. Find the probability of Type I and Type II errors, as well as 
the overall probability of error P,. 

(b) Find the Bayes decision rule that minimizes the probability of error. Find the proba- 
bility of Type I and Type II errors, as well as P,. 


(c) For the decision rules in parts a and b find n so that P, = 10°. 

A binary communications system accepts ©, which is “+1” or “—1”, as input and outputs 
X = @ + N, where N is a zero-mean Gaussian random variable with variance o°. The 
sender uses a repetition code where each “+1” or “—1” is transmitted n times, and the re- 
ceiver makes its decision based on the n outputs. Assume P[® = 1] = a = 1 — P[O = 0]. 
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8.88. 


8.89. 


8.90. 


8.91. 


8.92. 
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Statistics 


(a) Find the maximum likelihood decision rule and evaluate its Type I and Type II error 
probabilities as well as its overall probability of error. 

(b) Find the Bayes decision rule and compare its error probabilities to part a. 

(© Suppose a is such that P[N > 1] = 10°. Find the value of n in part b,so that P, = 10°. 
A widely used digital radio system transmits pairs of bits at a time. The input to the sys- 
tem is a pair (0,, ©2) where ©; can be + 1 or —1 and where the output of the channel 
is a pair of independent Gaussian random variables (X, Y) with variance g? and means 
0, and O,, respectively. Assume P[O; = 1] = a = 1 — P[O; = 0] and that the input 
bits are independent of each other. The receiver observes the pair (X, Y) and based on 
their values decides on the input pair (O,, 02). 


(a) Plot fy y(x, yl ®;, ©) for the four possible input pairs. 

(b) Let the cost be zero if the receiver correctly identifies the input pair, and let the cost 
be one otherwise. Show that the Bayes’ decision rule selects the input pair (6), 02) 
that maximizes: 


fxy(x, y101, 2)P[@1, = 0, ©2 = 02]. 


(c) Find the four decision regions in the plane when the inputs are equally likely. Show 
that this corresponds to the maximum likelihood decision rule. 

Show that the Bayes estimator for the cost function C(g(X), ©) = | g(X) — Ol, is given 

by the median of the a posteriori pdf f9(@| X). Hint: Write the integral for the average 

cost as the sum of two integrals over the regions g(X) > 6 and g(X) < 9, and then dif- 

ferentiate with respect to g(X). 


Show that the Bayes’ estimator for the cost function in Eq. (8.96) is given by the MAP es- 
timator for 0. 


Let the observations X1, X>,..., X„ be iid Gaussian random variables with unit variance 
and unknown mean ©. Suppose that © is itself a Gaussian random variable with mean 0 
and variance g°. Find the following estimators: 


(a) The minimum mean square estimator for ©. 

(b) The minimum mean absolute error estimator for ©. 

(c) The MAP estimator for ©. 

Let X be a uniform random variable in the interval (0, ©), where © has a gamma distri- 
bution fe(@) = de for 6 > 0. 

(a) Find the estimator that minimizes the mean absolute error. 

(b) Find the estimator that minimizes the mean square error. 


Let X be a binomial random variable with parameters n and ©. Suppose that © has a 
beta distribution with parameters a and £. 


(a) Show that fo(@| X = k) is a beta pdf with parameters a + k and B + n — k. 
(b) Show that the minimum mean square estimator is then (a + k)/(a + B + n). 


Let X be a binomial random variable with parameters n and ©. Suppose that © is uniform 
in the interval [0, 1]. Consider the following cost function which emphasizes the errors at 
the extreme values of 6: 


(0 — g(X))° 


C(e(X).9) = Gap 
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Show that the Bayes estimator is given by 


Section 8.7: Testing the Fit of a Distribution to Data 


8.94. 


8.95. 


8.96. 


8.97. 


8.98. 
8.99. 


8.100. 


The following histogram was obtained by counting the occurrence of the first digits in 
telephone numbers in one column of a telephone directory: 


digit 0 1 2 3 4 5 6 7 89 
observed 0 0 24 2 25 3 32 15 22 


Test the goodness of fit of this data to a random variable that is uniformly distributed in 
the set {0, 1,...,9} at a 1% significance level. Repeat for the set {2, 3,..., 9}. 


A die is tossed 96 times and the number of times each face occurs is counted: 


nk 25 8 17 20 13 13 


(a) Test the goodness of fit of the data to the pmf of a fair die at a 5% significance level. 

(b) Run the following experiment 100 times: Generate 50 iid random variables from 
the discrete pmf {1/6, 1/6, 1/6, 1/6, 3/24, 5/24}. Test the goodness of fit of this data to 
tosses from a fair die. What is the relative frequency with which the null hypothesis 
is rejected? 

(c) Repeat part b using a sample size of 100 iid random variables. 

(a) Show that the D? statistic when K = 2 is: 


D = (ny — np} _ l (nı — mpi) j 
npı(1 — pı) Vnpi(1 — pi) 


(b) Explain why D? approaches a chi-square random variable with 1 degree of freedom 
as n becomes large. 


(a) Repeat the following experiment 500 times: Generate 100 samples of the sum of X 
of 10 iid uniform random variables from the unit interval. Perform a goodness-of-fit 
test of the random samples of X to the Gaussian random variable with the same 
mean and variance. What is the relative frequency with which the null hypothesis is 
rejected at a 5% level? 

(b) Repeat part a for sums of 20 iid uniform random variables. 

Repeat Problem 8.97 for the sum of exponential random variables with mean 1. 

A computer simulation program gives pairs of numbers (X, Y) that are supposed to be 

uniformly distributed in the unit square. Use the chi-square test to assess the goodness of 

fit of the computer output. 

Use the approach in Problem 8.99 to develop a test for the independence between two 

random variables X and Y. 
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Problems Requiring Cumulative Knowledge 


8.101. 


8.102. 


8.103. 


8.104. 


8.105. 


You are asked to characterize the behavior of a new binary communications system in 
which the inputs are {0, 1} and the outputs are {0, 1}. Design a series of tests to charac- 
terize the errors introduced in transmissions using the system. How would you estimate 
the probability of error p? How would you determine whether the p is fixed or whether 
it varies? How would you determine whether errors introduced by the system are inde- 
pendent of each other? How would you determine whether the errors introduced by the 
system are dependent on the input? 

You are asked to characterize the behavior of a new binary communications system in 
which the inputs are {0, 1} and the outputs assume a continuum of real values. What tests 
would you change and what tests would you keep from Problem 8.101? 


Your summer job with the local bus company entails sitting at a busy intersection and 
recording the bus arrival times for several routes in a table next to their scheduled times. 
How would you characterize the arrival time behavior of the buses? 


Your friend Khash has a summer job with an Internet access provider that involves char- 
acterizing the packet transit times to various key sites on the Internet. Your friend has ac- 
cess to some nifty hardware for generating test packets, including GPS systems, to 
provide accurate timestamps. How would your friend go about characterizing these tran- 
sit times? 

Leigh’s summer job is with a startup testing a new optical device. Leigh runs a standard 
test on these devices to determine their failure rates and failure root causes. He looks at 
the dependence of failures on the supplier, on impurities in the devices, and on different 
approaches to preparing the devices. How should Leigh go about characterizing failure 
rate behavior? How should he identify root causes for failures? 


CHAPTER 


Random Processes 


In certain random experiments, the outcome is a function of time or space. For exam- 
ple, in speech recognition systems, decisions are made on the basis of a voltage wave- 
form corresponding to a speech utterance. In an image processing system, the intensity 
and color of the image varies over a rectangular region. In a peer-to-peer network, the 
number of peers in the system varies with time. In some situations, two or more func- 
tions of time may be of interest. For example, the temperature in a certain city and the 
demand placed on the local electric power utility vary together in time. 

The random time functions in the above examples can be viewed as numerical 
quantities that evolve randomly in time or space. Thus what we really have is a family 
of random variables indexed by the time or space variable. In this chapter we begin the 
study of random processes. We will proceed as follows: 


e In Section 9.1 we introduce the notion of a random process (or stochastic 
process), which is defined as an indexed family of random variables. 

e We are interested in specifying the joint behavior of the random variables within 
a family (i.e., the temperature at two time instants). In Section 9.2 we see that this 
is done by specifying joint distribution functions, as well as mean and covariance 
functions. 

e In Sections 9.3 to 9.5 we present examples of stochastic processes and show how 
models of complex processes can be developed from a few simple models. 

e In Section 9.6 we introduce the class of stationary random processes that can be 
viewed as random processes in “steady state.” 

e In Section 9.7 we investigate the continuity properties of random processes and 
define their derivatives and integrals. 

e In Section 9.8 we examine the properties of time averages of random processes 
and the problem of estimating the parameters of a random process. 

e In Section 9.9 we describe methods for representing random processes by Fouri- 
er series and by the Karhunen-Loeve expansion. 

e Finally, in Section 9.10 we present methods for generating random processes. 
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Chapter 9 Random Processes 


DEFINITION OF A RANDOM PROCESS 


Consider a random experiment specified by the outcomes ¢ from some sample space S, 
by the events defined on S, and by the probabilities on these events. Suppose that to 
every outcome ¢ e S, we assign a function of time according to some rule: 


X(t, f) tel. 


The graph of the function X(t, ¢) versus t, for ¢ fixed, is called a realization, sample 
path, or sample function of the random process. Thus we can view the outcome of the 
random experiment as producing an entire function of time as shown in Fig. 9.1. On the 
other hand, if we fix a time ¢, from the index set J, then X (tp, ¢) is a random variable 
(see Fig. 9.1) since we are mapping ¢ onto a real number. Thus we have created a fam- 
ily (or ensemble) of random variables indexed by the parameter t, {X(t, č), tel}. 
This family is called a random process. We also refer to random processes as stochastic 
processes. We usually suppress the ¢ and use X(t) to denote a random process. 

A stochastic process is said to be discrete-time if the index set J is a countable set 
(i.e., the set of integers or the set of nonnegative integers). When dealing with discrete- 
time processes, we usually use n to denote the time index and X, to denote the random 
process. A continuous-time stochastic process is one in which 7 is continuous (i.e., the 
real line or the nonnegative real line). 

The following example shows how we can imagine a stochastic process as result- 
ing from nature selecting ¢ at the beginning of time and gradually revealing it in time 
through X(t, 2). 


X(t, či) $ 


X(t, fo) 4 


FIGURE 9.1 
Several realizations of a random process. 
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Example 9.1 Random Binary Sequence 


Let ¢ be a number selected at random from the interval S = [0, 1], and let bb}... be the binary 
expansion of ¢: 


f= X b27 where b;e {0, 1}. 
i=l 


Define the discrete-time random process X (n, £) by 
X(n,f) = by m=1,2,.:.. 


The resulting process is sequence of binary numbers, with X (n, ¢) equal to the nth number in 
the binary expansion of ¢. 


Example 9.2 Random Sinusoids 
Let ¢ be selected at random from the interval [—1,1]. Define the continuous-time random 
process X(t, ¢) by 

X(t, ¢) = ¢ cos(2rt) —o0 < t < œ. 


The realizations of this random process are sinusoids with amplitude ¢, as shown in Fig. 9.2(a). 
Let ¢ be selected at random from the interval (—7, m) and let Y(t, ¢) = cos(2at + ¢). 
The realizations of Y(t, £) are phase-shifted versions of cos 27t as shown in Fig 9.2(b). 


FIGURE 9.2 
(a) Sinusoid with random amplitude, (b) Sinusoid with random 
phase. 
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The randomness in ¢ induces randomness in the observed function X(t, £). In 
principle, one can deduce the probability of events involving a stochastic process at 
various instants of time from probabilities involving ¢ by using the equivalent-event 
method introduced in Chapter 4. 


Example 9.3 
Find the following probabilities for the random process introduced in Example 9.1: 
P[X(1, 2) = 0] and P[X(1, £) = Oand X(2, Z) = 1]. 

The probabilities are obtained by finding the equivalent events in terms of ¢: 


Pixa = 0)= Plose<5]=5 


1 1] 1 
P[X(1, £) = 0 and X(2, 2) = 1] p|: er J F 


since all points in the interval [0 = ¢ = 1] begin with b, = 0 and all points in [1/4, 1/2) begin 
with b; = 0 and b, = 1. Clearly, any sequence of k bits has a corresponding subinterval of length 
(and hence probability) 2~*. 


Example 9.4 


Find the pdf of Xp = X(t, ¢) and Y(t, ¢) in Example 9.2. 

If tọ is such that cos(27fy) = 0, then X (tọ, £) = 0 for all ¢ and the pdf of X (tọ) is a delta 
function of unit weight at x = 0. Otherwise, X (tọ, ¢) is uniformly distributed in the interval 
(—cos 279, cos 27rty) since ¢ is uniformly distributed in [—1, 1] (see Fig. 9.3a). Note that the pdf 
of X (tọ, ¢) depends on fy. 

The approach used in Example 4.36 can be used to show that Y (tọ, ¢) has an arcsine dis- 
tribution: 

1 
=— = <1 
Ply) = WG P [yl 
(see Fig. 9.3b). Note that the pdf of Y(t, ¢) does not depend on tọ. 

Figure 9.3(c) shows a histogram of 1000 samples of the amplitudes X(t, ¢) at tọ = 0, 
which can be seen to be approximately uniformly distributed in [—1, 1]. Figure 9.3(d) shows the 
histogram for the samples of the sinusoid with random phase. Clearly there is agreement with 
the arcsine pdf. 


In general, the sample paths of a stochastic process can be quite complicated 
and cannot be described by simple formulas. In addition, it is usually not possible to 
identify an underlying probability space for the family of observed functions of time. 
Thus the equivalent-event approach for computing the probability of events involving 
X(t, ¢) in terms of the probabilities of events involving ¢ does not prove useful in 
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KECSES] 4 : Fray) 4 


1/2 cos 27ty 


> X : : wy 


— cos 27ty 0 cos 270g =] 0 1 
(a) (b) 
0.1 0.2 
el | 015 
0.06 = = 
0.1 
0.04 = 
0.05 
0.02 — = 
0 0 
=f —0.5 0 0.5 1 = =0:5 0 0.5 1 
(c) (d) 
FIGURE 9.3 


(a) pdf of sinusoid with random amplitude. (b) pdf of sinusoid with random phase. (c) Histogram of samples from 
uniform amplitude sinusoid at t = 0. (d) Histogram of samples from random phase sinusoid at t = 0. 


practice. In the next section we show an alternative method for specifying the proba- 
bilities of events involving a stochastic process. 


SPECIFYING A RANDOM PROCESS 


There are many questions regarding random processes that cannot be answered with 
just knowledge of the distribution at a single time instant. For example, we may be in- 
terested in the temperature at a given locale at two different times. This requires the 
following information: 


Plxy < X(t) S x1, x2 < X(b) S x2]. 


In another example, the speech compression system in a cellular phone predicts the 
value of the speech signal at the next sampling time based on the previous k samples. 
Thus we may be interested in the following probability: 


Pla < X(tk+1) S b| X(t) = x1, X (h) = x2,..., X (ty) = xg]. 
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It is clear that a general description of a random process should provide probabilities 
for vectors of samples of the process. 


Joint Distributions of Time Samples 


Let X,, Xy,..., Xg be the k random variables obtained by sampling the random 
process X(t, ¢) at the times t,,,..., tk: 


xX, = X(t, é), Xo = X(h, Luang Xk = X (tk, é), 


as shown in Fig. 9.1. The joint behavior of the random process at these k time instants 
is specified by the joint cumulative distribution of the vector random variable 
X1, X2,..., Xg. The probabilities of any event involving the random process at all or 
some of these time instants can be computed from this cdf using the methods devel- 
oped for vector random variables in Chapter 6. Thus, a stochastic process is specified by 
the collection of kth-order joint cumulative distribution functions: 


Fy Xs X25 Xe) = P[X (t) S x1, X(t) S X05... X(t) 5 xx], (9.1) 


for any k and any choice of sampling instants t,,..., tg. Note that the collection of cdf’s 
must be consistent in the sense that lower-order cdf’s are obtained as marginals of 
higher-order cdf’s. If the stochastic process is continuous-valued, then a collection of 
probability density functions can be used instead: 


fxi., X (X1; X25- - -> Xk) dX1 2 -dX 


= P{x, < X(t) = x1 ¥ dX1,..., Xk < X (tk) = Xk + dxx]. (9.2) 


If the stochastic process is discrete-valued, then a collection of probability mass 
functions can be used to specify the stochastic process: 


PX. X(¥19 X23- 3 Xe) = PLX(t)) = x1, X(t) = X2,...,X (te) = xXx] (9.3) 


for any k and any choice of sampling instants n,,..., Nx. 

At first glance it does not appear that we have made much progress in specifying 
random processes because we are now confronted with the task of specifying a vast 
collection of joint cdf’s! However, this approach works because most useful models of 
stochastic processes are obtained by elaborating on a few simple models, so the meth- 
ods developed in Chapters 5 and 6 of this book can be used to derive the required cdf’s. 
The following examples give a preview of how we construct complex models from sim- 
ple models. We develop these important examples more fully in Sections 9.3 to 9.5. 


Example 9.5 iid Bernoulli Random Variables 


Let X,, be a sequence of independent, identically distributed Bernoulli random variables with 
p = 1/2. The joint pmf for any k time samples is then 


1 k 
P[X, = X1, Xo = Xo,..., Xk = Xx] = P[ X; = x1]... P[X, = xx] = G) 
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where x;e {0,1} for all i. This binary random process is equivalent to the one discussed in 
Example 9.1. 


Example 9.6 iid Gaussian Random Variables 


Let X,, be a sequence of independent, identically distributed Gaussian random variables with 
zero mean and variance oy. The joint pdf for any k time samples is then 


1 
fx X,...X, (41 Xas- Xk) = e 


(xp tagte + +xZ)/207 
(2mo?) 


The following two examples show how more complex and interesting processes 
can be built from iid sequences. 


Example 9.7 Binomial Counting Process 


Let X,, be a sequence of independent, identically distributed Bernoulli random variables with 
p = 1/2. Let S,, be the number of 1’s in the first n trials: 


S,=X,+X,+--4+X, for n=0,1,.... 


Sn is an integer-valued nondecreasing function of n that grows by unit steps after a random num- 
ber of time instants. From previous chapters we know that S, is a binomial random variable with 
parameters n and p = 1/2. In the next section we show how to find the joint pmf’s of S,, using 
conditional probabilities. 


Example 9.8 Filtered Noisy Signal 


Let X, be a sequence of independent, identically distributed observations of a signal voltage u 


corrupted by zero-mean Gaussian noise N; with variance o’: 


X=u+N, for j=0,1,.... 


Consider the signal that results from averaging the sequence of observations: 


Sn = (X+ X% +--+ X,)/n for n=0,1,.... 


From previous chapters we know that S, is the sample mean of an iid sequence of Gaussian ran- 
dom variables. We know that S, itself is a Gaussian random variable with mean u and variance 
@°In, and so it tends towards the value u as n increases. In a later section, we show that S, is an 
example from the class of Gaussian random processes. 


The Mean, Autocorrelation, and Autocovariance Functions 


The moments of time samples of a random process can be used to partially specify the 
random process because they summarize the information contained in the joint cdf’s. 
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The mean function my(t) and the variance function VAR[X(0)] of a continuous-time 
random process X(t) are defined by 


my(t) = E[X(d)] = i _xfrw(2) dx, (9.4) 


and 
VAR(X(] = f (= = maD fro) dx, 05) 


where fx((x) is the pdf of X(t). Note that m y(t) and VAR[X(¢)] are deterministic 
functions of time. Trends in the behavior of X(t) are reflected in the variation of m y(t) 
with time. The variance gives an indication of the spread in the values taken on by X(t) 
at different time instants. 

The autocorrelation Ry(4, 6) of a random process X(t) is defined as the joint 
moment of X(t,) and X(t): 


Ryltyst) = ELX (th) X()] = is [fears dxdy, (9.6) 


where fx (+,),x(1,)(X, y) is the second-order pdf of X(t). In general, the autocorrelation 


is a function of f; and b. Note that Ry(t,t) = E[X?(t)]. 
The autocovariance Cy(¢,, t) of a random process X(t) is defined as the covari- 
ance of X(t,) and X (b): 


Cx(t, bh) = EL{ X(t) — mx(h)}{X(h) — my(t)}]- (9.7) 


From Eq. (5.30), the autocovariance can be expressed in terms of the autocorrelation 
and the means: 


Cx(fh, b) = Rx(t, b) — my(t)mx(t). (9.8) 
Note that the variance of X(t) can be obtained from Cx(t, t2): 
VAR[X(¢)] = E[(X(t) — my(t))’] = Cx(t t). (9.9) 


The correlation coefficient of X(t) is defined as the correlation coefficient of 
X(t;) and X(t) (see Eq. 5.31): 
B Cx(ti, b) 
VCx(t t) V Cx(b, b) 


px(ti, b) (9.10) 


From Eq. (5.32) we have that |ox(t4, f)| < 1. Recall that the correlation coefficient is 
a measure of the extent to which a random variable can be predicted as a linear func- 
tion of another. In Chapter 10, we will see that the autocovariance function and the au- 
tocorrelation function play a critical role in the design of linear methods for analyzing 
and processing random signals. 
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The mean, variance, autocorrelation, and autocovariance functions for discrete- 
time random processes are defined in the same manner as above. We use a slightly dif- 
ferent notation for the time index. The mean and variance of a discrete-time random 
process X,, are defined as: 


my(n) = E[X,] and VAR[Xy] = E[(X, — my(n))?]. (9.11) 


The autocorrelation and autocovariance functions of a discrete-time random process 
X,, are defined as follows: 


Rx(m, m) = ELX(n)X(n2)] (9.12) 
and 


Cx(m, m) = EL{X(m) — mx(m)}{X (m2) — mx(m)}] 
= Rx(m, m) — mx(m)mx(m). (9.13) 


Before proceeding to examples, we reiterate that the mean, autocorrelation, 
and autocovariance functions are only partial descriptions of a random process. Thus 
we will see later in the chapter that it is possible for two quite different random 
processes to have the same mean, autocorrelation, and autocovariance functions. 


Example 9.9 Sinusoid with Random Amplitude 


Let X(t) = A cos 27t, where A is some random variable (see Fig. 9.2a). The mean of X(t) is 
found using Eq. (4.30): 


my(t) = E[|A cos 27t] = E| A] cos 27t. 
Note that the mean varies with t. In particular, note that the process is always zero for values of t 


where cos 27t = 0. 
The autocorrelation is 


Ry(ti, h) = E[A cos 27t; A cos 2mh ] 


= E[ A] cos 2mt, cos 2mh, 


and the autocovariance is then 


Cx(f, b) = Rx(t, h) — mx(t)mx(h) 
= {E| Æ] — E[A}*} cos 27t, cos 2rth 
= VAR| A] cos 271, cos 2rth. 


Example 9.10 Sinusoid with Random Phase 


Let X(t) = cos(@t + ©), where © is uniformly distributed in the interval (—7, 7) (see Fig. 
9.2b). The mean of X(t) is found using Eq. (4.30): 
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my(t) = E[cos(ot + ©)] = 4f cos(wt + 0) dd = 0. 


The autocorrelation and autocovariance are then 


Cx(ti, t2) = Rx(t, tr) R E{cos(at F 0) cos(atz F 0)] 


1 f"1 

= — | x{cos(@(t, — b) + cos(@(t, + h) + 20)} d0 
2T Jr 2 
1 

= zoslolt t)), 


where we used the identity cos(a) cos(b) = 1/2 cos(a + b) + 1/2 cos(a — b). Note that m y(t) 
is a constant and that Cy(t,, t) depends only on |ti — tl. Note as well that the samples at time 
tı and fy are uncorrelated if w(t; — t) = ka where k is any integer. 


Multiple Random Processes 


In most situations we deal with more than one random process at a time. For example, 
we may be interested in the temperatures at city a, X(t), and city b, Y(t). Another very 
common example involves a random process X(f) that is the “input” to a system and 
another random process Y(f) that is the “output” of the system. Naturally, we are inter- 
ested in the interplay between X(t) and Y(t). 

The joint behavior of two or more random processes is specified by the collec- 
tion of joint distributions for all possible choices of time samples of the processes. 
Thus for a pair of continuous-valued random processes X(t) and Y(t) we must speci- 
fy all possible joint density functions of X (tı), ..., X(t) and Y(t;),..., Y (t;) for all 
k, j, and all choices of t,,...,¢ and ț,..., tj. For example, the simplest joint pdf 
would be: 


Fx(n).y(y(% y) dxdy = P{x < X(t) S x + dx, y < Y(t) = y + dy]. 


Note that the time indices of X(t) and Y(t) need not be the same. For example, we may 
be interested in the input at time ¢, and the output at a later time t. 

The random processes X(t) and Y(t) are said to be independent random processes 
if the vector random variables X = (X(t,),..., X(t,)) and Y = (Y (ti),..., Y(¢)) are 


j 
independent for all k, j, and all choices of t),...,t, and t,..., tj: 


Fx y(x, see Xs Yh.. yj) = Fx(Xı, see Xx) Fy Y, see Ve 
The cross-correlation Ry y(4, h) of X(t) and Y(t) is defined by 
Ryy(ti.b) = E[X(H)Y(b)]. (9.14) 
The processes X(t) and Y(t) are said to be orthogonal random processes if 


Ry y(t y ty) = 0 for all ty and th. (9.15) 
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The cross-covariance Cy (4, h) of X(t) and Y(t) is defined by 
Cxy(t b) = EHX (t) — mx(ti) HY (6) — mx(t)}] 
= Ryy(t, b) — mx(t)myx(h). (9.16) 


The processes X(t) and Y(t) are said to be uncorrelated random processes if 


Cy y(t, ty) = 0 for all ti and bz. (9.17) 


Example 9.11 


Let X(t) = cos(@t + ©) and Y(t) = sin(wt + ©), where © is a random variable uniformly 
distributed in | —7, 7]. Find the cross-covariance of X(t) and Y(t). 

From Example 9.10 we know that X(t) and Y(t) are zero mean. From Eq. (9.16), the cross- 
covariance is then equal to the cross-correlation: 


Cx y(t, b) = Rxy(t, b) = Elcos(wt; + ©) sin(wt, + ©)] 


sin(@(t — t)) + ssin(oo(t + th) + 20) 


ll 
| 
| 
n 
2 
=) 
~~ 
€ 
A 
~ 
S 
| 
se 
| 
x 


since E[sin(@(t, + b) + 20)] = 0. X(t) and Y(t) are not uncorrelated random processes be- 
cause the cross-covariance is not equal to zero for all choices of time samples. Note, however, 
that X(t) and Y(t) are uncorrelated random variables for t, and h such that w(t; — t) = ka 
where k is any integer. 


Example 9.12 Signal Plus Noise 
Suppose process Y(t) consists of a desired signal X(t) plus noise N(t): 
Y(t) = X(t) + N(t). 
Find the cross-correlation between the observed signal and the desired signal assuming that X(t) 


and N(t) are independent random processes. 
From Eq. (8.14), we have 


Ryy(t, b) = ELX(4)Y(h)) 


= E[X(t){X(h) + N(h)}]) 
= Ry(t,h) + ELX(t)JE[N(b)] 
= Ry(h, b) + my(t)my(h), 


where the third equality followed from the fact that X(t) and M(t) are independent. 
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9.3.1 


Chapter 9 Random Processes 


DISCRETE-TIME PROCESSES: SUM PROCESS, BINOMIAL COUNTING 
PROCESS, AND RANDOM WALK 


In this section we introduce several important discrete-time random processes. We 
begin with the simplest class of random processes—independent, identically distrib- 
uted sequences—and then consider the sum process that results from adding an iid se- 
quence. We show that the sum process satisfies the independent increments property as 
well as the Markov property. Both of these properties greatly facilitate the calculation 
of joint probabilities. We also introduce the binomial counting process and the random 
walk process as special cases of sum processes. 


iid Random Process 


Let X, be a discrete-time random process consisting of a sequence of independent, 
identically distributed (iid) random variables with common cdf F(x), mean m, and 
variance o°. The sequence _X,, is called the iid random process. 


The joint cdf for any time instants n,,..., ngis given by 
Es Mis MS cg ep) = P[X S x1, X% S x2,..., Xk 5 x4] 
= Fy(x1)Fy(x2) ... Fy(Xx), (9.18) 


where, for simplicity, Xą denotes X,,. Equation (9.18) implies that if X,, is discrete- 
valued, the joint pmf factors into the product of individual pmf’s, and if X,, is continu- 
ous-valued, the joint pdf factors into the product of the individual pdf’s. 

The mean of an iid process is obtained from Eq. (9.4): 


my(n) = E[X,| =m for all n. (9.19) 


Thus, the mean is constant. 
The autocovariance function is obtained from Eq. (9.6) as follows. If in, # m2, then 


Cx(m, m) = E| (Xn — m)(Xp, — m)] 
= E| (Xn — m)JE[(Xn, — m)] = 0, 


since X,,, and X, are independent random variables. If ny = nz = n, then 
Cx(m, m) = E[(X, — m)’] = o°. 
We can express the autocovariance of the iid process in compact form as follows: 
Cx(n, n) = O8n:n55 (9.20) 


where 6y,n, = 1 if nı = m, and 0 otherwise. Therefore the autocovariance function is 
zero everywhere except for nı = nz. The autocorrelation function of the iid process is 
found from Eq. (9.7): 


Ry(m, m) = Cx(n, m) + m. (9.21) 
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(a) Realization of a Bernoulli process. /, = 1 indicates that a light bulb fails and is replaced on day n. (b) Realization of a binomial 
process. S,, denotes the number of light bulbs that have failed up to time n. 


Example 9.13 Bernoulli Random Process 


Let I, be a sequence of independent Bernoulli random variables. J, is then an iid random 
process taking on values from the set {0,1}. A realization of such a process is shown in Fig. 
9.4(a). For example, J, could be an indicator function for the event “a light bulb fails and is re- 
placed on day n.” 

Since J,, is a Bernoulli random variable, it has mean and variance 


m(n)=p  VAR[L] = p(1 - p). 


The independence of the /,,’s makes probabilities easy to compute. For example, the prob- 
ability that the first four bits in the sequence are 1001 is 


Pl =1,h =0,4=0,4,=1] 
PL, = 1PLh = O|PL = OPE, = 1] 
= p(1- py’. 


Similarly, the probability that the second bit is 0 and the seventh is 1 is 


Example 9.14 Random Step Process 


An up-down counter is driven by +1 or —1 pulses. Let the input to the counter be given by 
D, = 21, — 1, where 7, is the Bernoulli random process, then 
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For example, D,, might represent the change in position of a particle that moves along a straight 
line in jumps of +1 every time unit. A realization of D,, is shown in Fig. 9.5(a). 
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(a) Realization of a random step process. D, = 1 implies that the particle moves one step to the right at time n. (b) Realization 
of a random walk process. S, denotes the position of a particle at time n. 


9.3.2 


The mean of D, is 


mp(n) = E[D,] = E[2I, 


1] = 2E[Z,] -1 = 2p- 1. 


The variance of D, is found from Eqs. (4.37) and (4.38): 
VAR[D,] = VAR[2J, — 1] = 2? VAR[Z,] = 4p(1 — p). 


The probabilities of events involving D,, are computed as in Example 9.13. 


Independent Increments and Markov Properties of Random Processes 


Before proceeding to build random processes from iid processes, we present two very 
useful properties of random processes. Let X(f) be a random process and consider two 
time instants, fi < t2. The increment of the random process in the interval t < t = t, is 
defined as X (t) — X(t,). A random process X(t) is said to have independent increments 
if the increments in disjoint intervals are independent random variables, that is, for any k 
and any choice of sampling instants 4 < t < -+ < tpg, the associated increments 
X(t) — X(t), X(t) — X(b),---, X (te) — X (tk-1) 

are independent random variables. In the next subsection, we show that the joint pdf 
(pmf) of X(t,), X(t)),..., X(t) is given by the product of the pdf (pmf) of X(t,) and 
the marginal pdf’s (pmf’s) of the individual increments. 

Another useful property of random processes that allows us to readily obtain the 
joint probabilities is the Markov property. A random process X(t) is said to be Markov 
if the future of the process given the present is independent of the past; that is, for any k 


and any choice of sampling instants t4 < H <---< tę and for any x1, %5,...,X,, 
Frey (xrl X (te-1) = Xk-15---; X(t) = x1) 
= fyi (Xkl X(te-1) = XK-1) (9.22) 


9.3.3 
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X, ~d = Sn = Sn=1 a Xn 
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FIGURE 9.6 
The sum process S, = X4 +--+ + Xn, Sq = O, can be 
generated in this way. 


if X(t) is continuous-valued, and 


PLX (te) = Xi) X(te-1) = Xk-1;- -X (4) = 11] 
= PLX (ty) = x| X(tk-1) = xx-1] (9.23) 


if X(t) is discrete-valued. The expressions on the right-hand side of the above two 
equations are called the transition pdf and transition pmf, respectively. In the next sec- 
tions we encounter several processes that satisfy the Markov property. Chapter 11 is 
entirely devoted to random processes that satisfy this property. 

It is easy to show that a random process that has independent increments is also 
a Markov process. The converse is not true; that is, the Markov property does not imply 
independent increments. 


Sum Processes: The Binomial Counting and Random Walk Processes 


Many interesting random processes are obtained as the sum of a sequence of iid ran- 
dom variables, X1, X>,...: 


Spe Xt Mt t X, n= 1,2,... 
= Sn-1 + Xn (9.24) 


where Sọ = 0. We call S,, the sum process. The pdf or pmf of S, is found using the convo- 
lution or characteristic-equation methods presented in Section 7.1. Note that S,, depends 
on the “past,” S;,...,5,-1, only through S,,_,, that is, S,, is independent of the past 
when S,,_; is known. This can be seen clearly from Fig. 9.6, which shows a recursive pro- 
cedure for computing S,, in terms of S„-1 and the increment X,,. Thus S,, is a Markov 
process. 


Example 9.15 Binomial Counting Process 


Let the J; be the sequence of independent Bernoulli random variables in Example 9.13, and let 
S,, be the corresponding sum process. S„ is then the counting process that gives the number of 
successes in the first n Bernoulli trials. The sample function for S,, corresponding to a particular 
sequence of Is is shown in Fig. 9.4(b). Note that the counting process can only increase over 
time. Note as well that the binomial process can increase by at most one unit at a time. If J, indi- 
cates that a light bulb fails and is replaced on day n, then S,, denotes the number of light bulbs 
that have failed up to day n. 
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Since S,, is the sum of n independent Bernoulli random variables, S,, is a binomial random 
variable with parameters n and p = P[I = 1]: 


P([S, = j] = (e)a = pyri for0 <j <n, 


and zero otherwise. Thus S,, has mean np and variance np(1 — p). Note that the mean and vari- 
ance of this process grow linearly with time. This reflects the fact that as time progresses, that is, 
as n grows, the range of values that can be assumed by the process increases. If p > 0 then we 
also know that S,, has a tendency to grow steadily without bound over time. 

The Markov property of the binomial counting process is easy to deduce. Given that the 
current value of the process at time n — 1 is S,_; = k, the process at the next time instant will 
be k with probability 1 — p or k + 1 with probability p. Once we know the value of the process 
at time n — 1, the values of the random process prior to time n — 1 are irrelevant. 


Example 9.16 One-Dimensional Random Walk 


Let D,, be the iid process of +1 random variables in Example 9.14, and let S,, be the correspond- 
ing sum process. S,, can represent the position of a particle at time n. The random process S,, is an 
example of a one-dimensional random walk. A sample function of S,, is shown in Fig. 9.5(b). Un- 
like the binomial process, the random walk can increase or decrease over time. The random walk 
process changes by one unit at a time. 

The pmf of S, is found as follows. If there are k “+1”s in the first n trials, then there are 
n—k “—1”s, and S, = k — (n — k) = 2k — n. Conversely, S, = j if the number of +1’s is 
k = (j + n)/2. If (j + n)/2 is not an integer, then S,, cannot equal j. Thus 


P[S, = 2k — n] = ("Jor — p)"* — forke{0,1,...,n}. 


Since k is the number of successes in n Bernoulli trials, the mean of the random walk is: 


E[S„,] = 2np — n = n(2p — 1). 


As time progresses, the random walk can fluctuate over an increasingly broader range of posi- 
tive and negative values. S,, has a tendency to either grow if p > 1/2, or to decrease if p < 1/2. 
The case p = 1/2 provides a precarious balance, and we will see later, in Chapter 12, very inter- 
esting dynamics. Figure 9.7(a) shows the first 100 steps from a sample function of the random 
walk with p = 1/2. Figure 9.7(b) shows four sample functions of the random walk process with 
p = 1/2 for 1000 steps. Figure 9.7(c) shows four sample functions in the asymmetric case where 
p = 3/4. Note the strong linear growth trend in the process. 


The sum process S,, has independent increments in nonoverlapping time inter- 
vals. To see this consider two time intervals: ng < n S nı and n, < n S n3, where 
nı = m. The increments of S,, in these disjoint time intervals are given by 

Sn, = Sno g Xm+1 ioe Xn, 
Sco Se = aa eee Ay (9.25) 
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FIGURE 9.7 

(a) Random walk process with p = 1/2. (b) Four sample functions of 
symmetric random walk process with p = 1/2. (c) Four sample functions 
of asymmetric random walk with p = 3/4. 
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The above increments do not have any of the X,,’s in common, so the independence of 
the X,’s implies that the increments (Sn, — Sm) and (Sn, — S,,) are independent ran- 
dom variables. 

For n’ > n, the increment Sy — S, is the sum of n’ — n iid random variables, so 
it has the same distribution as S,,_,, the sum of the first n’ — n X’s, that is, 


PU Sy = Sn = y] = Pl Swix = y]. (9.26) 


Thus increments in intervals of the same length have the same distribution regardless of 
when the interval begins. For this reason, we also say that S,, has stationary increments. 


Example 9.17 Independent and Stationary Increments of Binomial Process 
and Random Walk 


The independent and stationary increments property is particularly easy to see for the binomial 
process since the increments in an interval are the number of successes in the corresponding 
Bernoulli trials. The independent increment property follows from the fact that the numbers of 
successes in disjoint time intervals are independent. The stationary increments property follows 
from the fact that the pmf for the increment in a time interval is the binomial pmf with the cor- 
responding number of trials. 

The increment in a random walk process is determined by the same number of successes 
as a binomial process. It then follows that the random walk also has independent and stationary 
increments. 


The independent and stationary increments property of the sum process S,, 
makes it easy to compute the joint pmf/pdf for any number of time instants. For sim- 
plicity, suppose that the X, are integer-valued, so S, is also integer-valued. We compute 
the joint pmf of S,, at times n4, n2, and n3: 


P Sn = Yis Sm T Ya, Sn, = y] 
= PIS), = Yis Sn, Sri ip) Yı» Sry ~ Sry TY X»), (9.27) 


since the process is equal to y1, y2, and y; at times n1, n2, and n3, if and only if it is 
equal to y, at time nı, and the subsequent increments are yp — yı, and y3 — y2. The 
independent increments property then implies that 


P| Sn, = y, Sn, = J), Sn = y3] 
= PIS», = yw P[Sn, a Sn, =e WI P[Sn, ad Sn = Y3 — yo]. (9.28) 
Finally, the stationary increments property implies that the joint pmf of S,, is given by: 
P| Sn, = y, Sn, = J), Sn, = y] 
= PU Sn; = yP Sm-n ES VIEW ew = y3 y2]. 


Clearly, we can use this procedure to write the joint pmf of S, at any time instants 
ny < m <-+:: <n, in terms of the pmf at the initial time instant and the pmf’s of the 
subsequent increments: 


Section 9.3 Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk 505 


P| Sn = y, Sn, = Ys- Sn, = Yk] 
= Pl Sn, = yi | PLS,—n, = y yı] re gis e = Ve 7T Yk-1]- (9.29) 


If the X,, are continuous-valued random variables, then it can be shown that the joint 
density of S,, at times ny, m, ..., Ny AS: 
Psn Spo Sp 1 Y2- Yk) = Fs ODS snn O2 = n)---fs,, (Yk = Yen) (9.30) 


Tk 1 


Example 9.18 Joint pmf of Binomial Counting Process 


Find the joint pmf for the binomial counting process at times n; and nz. Find the probability that 
P(S,, = 0,S,, = M — n], that is, the first n; trials are failures and the remaining trials are all 
successes. 

Following the above approach we have 


P Sn, Yi Sn, y2] P[ Sn, yilPISn i Sn, as? ti yı] 


= & E "po = pyr vty @iae = py 


» = y yı 


2 (" E ("omc — p)», 
» = y yı 


The requested probability is then: 


Mm — ny 


ny n n pz = 
P[Sn, = 0, Sn =m n] _ ( 2 ‘(Jp m1 = p)" = p” m(1 = p)” 


which is what we would obtain from a direct calculation for Bernoulli trials. 


Example 9.19 Joint pdf of Sum of iid Gaussian Sequence 


Let X,, be a sequence of iid Gaussian random variables with zero mean and variance o°. Find 
the joint pdf of the corresponding sum process at times n4 and nz. 

From Example 7.3, we know that S,„ is a Gaussian random variable with mean zero and 
variance no’. The joint pdf of S,, at times n j and ny is given by 


fs,,.5,( 01> ») = Fsn-n O2 = yi) fs,,(%1) 


1 e7271) /[2(m—m)0°] 1 


V 2ar(ny — n)a? V ano" 


eimo? 


Since the sum process S,, is the sum of n iid random variables, it has mean and 
variance: 


ms(n) = E[S,] = nE[X] = nm (9.31) 
VAR[S,] = n VAR[X] = no’. (9.32) 
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The property of independent increments allows us to compute the autocovariance in 
an interesting way. Suppose n = k son = min(n, k), then 
Cs(n, k) = E| (S, — nm)(S, — km)] 
E| (Sn — nm){(S, — nm) + (Se — km) — (Sa — nm)}] 
= E[(Sn — nm)*] + E[(S, — nm)(Sk — Sn — (k — n)m)). 


Since S,, and the increment Sy — S,, are independent, 


Cs(n, k) = E[(S, — nm)’] + E[(S, — nm)]E[(S,; — Sn — (k — n)m)] 
= E[(S, — nm)*] 
= VAR[S,] = no’, 
since E[S, — nm] = 0. Similarly, if k = min(n,k), we would have obtained ko’. 
Therefore the autocovariance of the sum process is 


G(n, k) = min(n, k)o”. (9.33) 


Example 9.20 Autocovariance of Random Walk 


Find the autocovariance of the one-dimensional random walk. 
From Example 9.14 and Eqs. (9.32) and (9.33), S, has mean n(2p — 1) and variance 
4np(1 — p). Thus its autocovariance is given by 


C,(n, k) = min(n, k)4p(1 — p). 


=i 


(b) 


FIGURE 9.8 
(a) First-order autoregressive process; (b) Moving average process. 
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The sum process can be generalized in a number of ways. For example, the recur- 
sive structure in Fig. 9.6 can be modified as shown in Fig. 9.8(a). We then obtain first- 
order autoregressive random processes, which are of interest in time series analysis and in 
digital signal processing. If instead we use the structure shown in Fig. 9.8(b), we obtain an 
example of a moving average process. We investigate these processes in Chapter 10. 


POISSON AND ASSOCIATED RANDOM PROCESSES 


In this section we develop the Poisson random process, which plays an important 
role in models that involve counting of events and that find application in areas 
such as queueing systems and reliability analysis. We show how the continuous- 
time Poisson random process can be obtained as the limit of a discrete-time 
process. We also introduce several random processes that are derived from the 
Poisson process. 


Poisson Process 


Consider a situation in which events occur at random instants of time at an average 
rate of A events per second. For example, an event could represent the arrival of a cus- 
tomer to a service station or the breakdown of a component in some system. Let M(t) 
be the number of event occurrences in the time interval [0, t]. M(t) is then a nonde- 
creasing, integer-valued, continuous-time random process as shown in Fig. 9.9. 


5: 10 15 20 25 30 35 40 45 50 
So Sı Sy Sg 
FIGURE 9.9 


A sample path of the Poisson counting process. The event occurrence times are denoted 
by S1, S2, . . . . The jth interevent time is denoted by X; = S;—Sj-1. 
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Suppose that the interval [0, ¢] is divided into n subintervals of very short dura- 
tion 6 = t/n. Assume that the following two conditions hold: 


1. The probability of more than one event occurrence in a subinterval is negligible 
compared to the probability of observing one or zero events. 

2. Whether or not an event occurs in a subinterval is independent of the outcomes 
in other subintervals. 


The first assumption implies that the outcome in each subinterval can be viewed as a 
Bernoulli trial. The second assumption implies that these Bernoulli trials are indepen- 
dent. The two assumptions together imply that the counting process N(t) can be ap- 
proximated by the binomial counting process discussed in the previous section. 

If the probability of an event occurrence in each subinterval is p, then the expect- 
ed number of event occurrences in the interval [0, ¢] is np. Since events occur at a rate 
of A events per second, the average number of events in the interval [0, t] is At. Thus we 
must have that 


At = np. 


If we now let n —> œ (i.e., ô = t/n —> 0) and p— 0 while np = At remains fixed, then 
from Eq. (3.40) the binomial distribution approaches a Poisson distribution with para- 
meter At. We therefore conclude that the number of event occurrences N(f) in the in- 
terval [0, ¢] has a Poisson distribution with mean At: 

(at) 


PING) =k) = re” fork = Olen (9.34a) 


For this reason N(f) is called the Poisson process. The mean function and the variance 
function of the Poisson process are given by: 


my(t) = E[N(t) =k] =At and VAR[N(t)] = àt. (9.34b) 


In Section 11.3 we rederive the Poisson process using results from Markov chain 
theory. 

The process M(t) inherits the property of independent and stationary increments 
from the underlying binomial process. First, the distribution for the number of event oc- 
currences in any interval of length t is given by Eq. (9.34a). Next, the independent and 
stationary increments property allows us to write the joint pmf for M(t) at any number 
of points. For example, for t < tb, 
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(An )ie™" (A(t) — t) e 
F G- od! ‘ (9.35a) 


The independent increments property also allows us to calculate the autocovariance of 
N(t). For t4 = b: 
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Cn(ti, b) = E[(N(t1) = Ati)(N(h) = Ab)] 

= E[(N(t) — Ah){N(h) = N(t) = àh + At + (N(ti) — At) FI 

= E[(N(4) — An) JEL N(m) = N(4) = Ah = t)] + VAR[N(4)] 

= VAR[ NM(t,)] = At. (9.35b) 
Example 9.21 


Inquiries arrive at a recorded message device according to a Poisson process of rate 15 inquiries 
per minute. Find the probability that in a 1-minute period, 3 inquiries arrive during the first 10 
seconds and 2 inquiries arrive during the last 15 seconds. 

The arrival rate in seconds is A = 15/60 = 1/4 inquiries per second. Writing time in sec- 
onds, the probability of interest is 


P[N(10) = 3 and N(60) — N(45) = 2]. 


By applying first the independent increments property, and then the stationary increments prop- 
erty, we obtain 


P[N(10) = 3 and N(60) — N(45) = 2] 
= P[N(10) = 3]P[N(60) — N(45) = 2] 
= P[N(10) = 3]P[N(60 — 45) = 2] 
(10/4)%e714 (15/4)*e" 54 
~ BI a 


Consider the time T between event occurrences in a Poisson process. Again sup- 
pose that the time interval [0, t] is divided into n subintervals of length ô = t/n. The 
probability that the interevent time T exceeds t seconds is equivalent to no event oc- 
curring in t seconds (or in n Bernoulli trials): 


P|T > t| = Pino events in t seconds | 


= hp)’ 
At \” 
(5 
n 
>e™  asn— oo, (9.36) 


Equation (9.36) implies that T is an exponential random variable with parameter A. 
Since the times between event occurrences in the underlying binomial process are in- 
dependent geometric random variables, it follows that the sequence of interevent times 
in a Poisson process is composed of independent random variables. We therefore con- 
clude that the interevent times in a Poisson process form an iid sequence of exponential 
random variables with mean 1/X. 
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Another quantity of interest is the time S,, at which the nth event occurs in a Pois- 
son process. Let 7; denote the iid exponential interarrival times, then 


S, =7T, + T, +-:-+ T;. 


In Example 7.5, we saw that the sum of n iid exponential random variables has an Er- 
lang distribution. Thus the pdf of S,, is an Erlang random variable: 
(ay) 


fs) = ane for y = 0. (9.37) 


Example 9.22 


Find the mean and variance of the time until the tenth inquiry in Example 9.20. 

The arrival rate is A = 1/4 inquiries per second, so the interarrival times are exponential 
random variables with parameter A. From Table 4.1, the mean and variance of exponential inter- 
arrival times then 1/A and 1/A”, respectively. The time of the tenth arrival is the sum of ten such 
iid random variables, thus 


10 
E[S10] = 10E[T] = =~ = 40sec 


10 
VAR[Sj9] = 10 VAR[T] = z7 160 sec?. 


In applications where the Poisson process models customer interarrival times, it is 
customary to say that arrivals occur “at random.” We now explain what is meant by this 
statement. Suppose that we are given that only one arrival occurred in an interval [0, ¢] 
and we let X be the arrival time of the single customer. For 0 < x < t, N(x) is the num- 
ber of events up to time x, and N(t) — N(x) is the increment in the interval (x, ¢], then: 


P[X = x] = P[N(x) = 1|N(t) = 1] 
P[N(x) = 1and N(t) = 1] 
7 P[N(t) = 1] 
P[N(x) = 1and N(t) — N(x) = 0] 
7 P[N(t) = 1] 
P[N(x) = 1]P[N(t) — N(x) = 0] 
7 P[N(t) = 1] 
7 Axe Ax p-A(t—x) 
Ate ™ 
ies (9.38) 


Equation (9.38) implies that given that one arrival has occurred in the interval [0, t], 
then the customer arrival time is uniformly distributed in the interval [0, ¢]. It is in this 
sense that customer arrival times occur “at random.” It can be shown that if the number 
of amvals in the interval [0, t] is k, then the individual arrival times are distributed inde- 
pendently and uniformly in the interval. 


9.4.2 
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Example 9.23 


Suppose two customers arrive at a shop during a two-minute period. Find the probability that 
both customers arrived during the first minute. 

The arrival times of the customers are independent and uniformly distributed in the two- 
minute interval. Each customer arrives during the first minute with probability 1/2. Thus the 
probability that both arrive during the first minute is (1/2)? = 1/4. This answer can be verified by 
showing that P[N(1) = 2| N(2) = 2] = 1/4. 


Random Telegraph Signal and Other Processes Derived from the Poisson Process 


Many processes are derived from the Poisson process. In this section, we present two 
examples of such random processes. 


Example 9.24 Random Telegraph Signal 


Consider a random process X(t) that assumes the values +1. Suppose that X(0) = +1 or —1 
with probability 1/2, and suppose that X(t) changes polarity with each occurrence of an event in 
a Poisson process of rate a. Figure 9.10 shows a sample function of X(t). 
The pmf of X(t) is given by 
P{ X(t) = £1] = P[X(t) = £1|X(0) = 1JPLX(0) = 1] 


+ P[X(t) = 41|X(0) = -1]P[X(0) = —1]. (9.39) 


The conditional pmf’s are found by noting that X(t) will have the same polarity as X(0) only 
when an even number of events occur in the interval (0, ¢]. Thus 


P[ X(t) = +1|X(0) = +1] =P N@ = = even integer] 


2o 


j=0 


eo 


Il 


S 


eN 
jà 


— ee” + et 


N 


ll 


a + e°). (9.40) 


FIGURE 9.10 
Sample path of a random telegraph signal. The times between transitions X; are iid 
exponential random variables. 
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X(t) and X(0) will differ in sign if the number of events in t is odd: 


P[X(t) = £1|X(0) = F1] = > 


ll 
| 
A 
a 
| 
5 

Y 

R 

X 
= 


We obtain the pmf for X(t) by substituting into Eq. (9.40): 


P{X(t)=1]= i H e?u) 4 ssl eu} = : 
P[X(t) = -1]=1- P[X(t)=1] = > 


Thus the random telegraph signal is equally likely to be +1 at any time t > 0. 
The mean and variance of X(t) are 


my(t) = 1PLX(t) = 1] + (-1)P[X(t) = -1] = 0 
VAR[X(t)] = ELX()"] = 


The autocovariance of X(t) is found as follows: 
Cx(t, bh) = EL X(t) X(b)] 
= 1P[X(4) = X(h)] + (“I)PLX(4) ¥ X(b)] 
= at + emih = xt = emh 


= elh- 


(?)P[X(t) = 1] + (-1)°P[ X(t) = -1] = 1. 


(9.41) 


(9.42) 


(9.43) 


(9.44) 


Thus time samples of X(t) become less and less correlated as the time between them increases. 


The Poisson process and the random telegraph processes are examples of the 


continuous-time Markov chain processes that are discussed in Chapter 11. 


Example 9.25 Filtered Poisson Impulse Train 


The Poisson process is zero at £ = 0 and increases by one unit at the random arrival times 
Sj, j = 1,2,.... Thus the Poisson process can be expressed as the sum of randomly shifted step 


functions: 


N() = Su -S) -N(0) = 0, 


i=1 


where the S; are the arrival times. 


Since the integral of a delta function 6(t — S) is a step function u(t — S), we can view N(t) 
as the result of integrating a train of delta functions that occur at times S,,, as shown in Fig. 9.11(a): 
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Z(t) —— jdt! > Nt) = 2 u(t Sy) 
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Z(t) Filter X(t) — ie — Sy) 
X(t) 
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(b) 


FIGURE 9.11 
(a) Poisson process as integral of train of delta functions. (b) Filtered 
train of delta functions. 


Z(t) = Sal S: 


i=l 
We can obtain other continuous-time processes by replacing the step function by another 
function A(t),! as shown in Fig. 9.11 (b): 


X(t) = Salt = S;). (9.45) 


i=1 


For example, h(t) could represent the current pulse that results when a photoelectron hits a de- 
tector. X(t) is then the total current flowing at time t. X(t) is called a shot noise process. 


'This is equivalent to passing Z(t) through a linear system whose response to a delta function is A(t). 
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The following example shows how the properties of the Poisson process can be 
used to evaluate averages involving the filtered process. 


Example 9.26 Mean of Shot Noise Process 


Find the expected value of the shot noise process X(t). 
We condition on N(t), the number of impulses that have occurred up to time t: 


E[X(t)] = E[ELX(¢)|N(¢)]]- 
Suppose N(t) = k, then 


Since the arrival times, S;,...,.$,, when the impulses occurred are independent, uniformly dis- 
tributed in the interval [0, t], 


Thus 
and 


Finally, we obtain 


= af h(u) du, (9.46) 


where we used the fact that E[N(t)] = At. Note that E[X(t)] approaches a constant value as t 
becomes large if the above integral is finite. 


9.5 GAUSSIAN RANDOM PROCESSES, WIENER PROCESS, AND BROWNIAN MOTION 


In this section we continue the introduction of important random processes. First, we 
introduce the class of Gaussian random processes which find many important applica- 
tions in electrical engineering. We then develop an example of a Gaussian random 
process: the Wiener random process which is used to model Brownian motion. 


9.5.1 
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Gaussian Random Processes 


A random process X(t) is a Gaussian random process if the samples X, = X(t,), 
X = X(ty),..., Xk = X(t.) are jointly Gaussian random variables for all k, and all 
choices of t,,...,¢. This definition applies to both discrete-time and continuous- 
time processes. Recall from Eq. (6.42) that the joint pdf of jointly Gaussian random 
variables is determined by the vector of means and by the covariance matrix: 


e712- m)"K!(x-m) 


(2r) PK 


Xixa... X,(X1> Hasso Xk) = (9.47a) 


In the case of Gaussian random processes, the mean vector and the covariance matrix 
are the values of the mean function and covariance function at the corresponding time 
instants: 


ER Cx(h,t) Cx(t,b) =  Cx(t,tk) 
oe * i K= Cxlh t) Cx(2,8) ate Cx(t29te) (9.47b) 
i : : : 
malt) Cx(tk, t1) pat Cx(tk, tk) 


Gaussian random processes therefore have the very special property that their joint pdf's 
are completely specified by the mean function of the process m x(t) and by the covariance 
function C(t, t). In Chapter 6 we saw that the linear transformations of jointly 
Gaussian random vectors result in jointly Gaussian random vectors as well. We will see 
in Chapter 10 that Gaussian random processes also have the property that the linear 
operations on a Gaussian process (e.g., a sum, derivative, or integral) results in another 
Gaussian random process. These two properties, combined with the fact that many sig- 
nal and noise processes are accurately modeled as Gaussian, make Gaussian random 
processes the most useful model in signal processing. 


Example 9.27 iid Discrete-Time Gaussian Random Process 


Let the discrete-time random process X,, be a sequence of independent Gaussian random vari- 
ables with mean m and variance o°. The covariance matrix for the times n,,..., ny is 


{Cx(n, m)} = {0° 6} = oI, 


where ô;; = 1 when i = j and 0 otherwise, and 7 is the identity matrix. Thus the joint pdf for the 
vector X,„ = (X, <> Xn is 


mot 


k 


1 
fx, (%15 X23- -, Xp) = (202)? apf X xi myno), 


i=1 


The Gaussian iid random process has the property that the value at every time instant is inde- 
pendent of the value at all other time instants. 
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Example 9.28 Continuous-Time Gaussian Random Process 


Let X(t) be a continuous-time Gaussian random process with mean function and covariance 
function given by: 


my(t) = 3t Cx(t, bh) Fe 9e, 


Find P[X(3) < 6] and P[X(1) + X(2) > 2]. 
The sample X(3) has a Gaussian pdf with mean my(3) = 3(3) = 9 and variance o%(3) = 
Cx(3,3) = 9e- = 9. To calculate P[X(3) < 6] we put X(3) in standard form: 


A a 
v9 v9 
From Example 6.24 we know that the sum of two Gaussian random variables is also a Gaussian 


random variable with mean and variance given by Eq. (6.47). Therefore the mean and variance 
of X(1) + X(2) are given by: 


P[X(3) < 6] = il | 1 — QO(-1) = O(1) = 0.16. 


E[X(1) + X(2)] = my(1) + my(2) =3 +6=9 


VAR[X(1) + X(2)] = Cx(1,1) + Cy(1,2) + Cx(2,1) + Cx(2, 2) 
= 9{e-1l ; e722-1l ; e721-2 , e 2-2 


= 9{2 + 2e7} = 20.43. 


To calculate P[X(1) + X(2) > 2] we put X (1) + X(2) in standard form: 


X(1)+ X(2)-9 15-9 


> 
V 20.43 V 20.43 


P[X(1) + X(2) > 15] = of l = Q(1.327) = 0.0922. 


Wiener Process and Brownian Motion 


We now construct a continuous-time Gaussian random process as a limit of a discrete- 
time process. Suppose that the symmetric random walk process (i.e., p = 1/2) of 
Example 9.16 takes steps of magnitude +h every 6 seconds. We obtain a continuous-time 
process by letting X(t) be the accumulated sum of the random step process up to time 
t. X5(t) is a staircase function of time that takes jumps of +h every ô seconds. At time t, 
the process will have taken n = [t/5] jumps, so it is equal to 


The mean and variance of X(t) are 


E[X3(t)] = AE[S,] = 0 
VAR[X;(t)] = kn VAR[D,] = kn, 


where we used the fact that VAR[D,,] = 4p(1 — p) = 1 since p = 1/2. 
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FIGURE 9.12 
Four sample functions of the Wiener process. 


Suppose that we take a limit where we simultaneously shrink the size of the 
jumps and the time between jumps. In particular let 5 > 0 and h —> 0 with h = Vað 
and let X(t) denote the resulting process. 

X(t) then has mean and variance given by 


E[ X(t)] = 0 (9.49a) 
VAR[X(t)] = (Vad)?(t/8) = at. (9.49b) 


Thus we obtain a continuous-time process X(t) that begins at the origin, has zero mean 
for all time, but has a variance that increases linearly with time. Figure 9.12 shows four 
sample functions of the process. Note the similarities in fluctuations to the realizations 
of a symmetric random walk in Fig. 9.7(b). X(t) is called the Wiener random process. It 
is used to model Brownian motion, the motion of particles suspended in a fluid that 
move under the rapid and random impact of neighboring particles. 

As 6 — 0, Eq. (9.48) implies that X(t) approaches the sum of an infinite number 
of random variables since n = [t/5] — oo: 


Sn 
a 
By the central limit theorem the pdf of X(t) therefore approaches that of a Gaussian 
random variable with mean zero and variance at: 
1 
fxy(*) = ecg 


X(t) inherits the property of independent and stationary increments from the 
random walk process from which it is derived. As a result, the joint pdf of X(t) at 


X(t) = limhS, = lim Vat (9.50) 
6-0 noo 


dad (9.51) 
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several times t, f2,..., tg can be obtained by using Eq. (9.30): 
fx), X(g)%10-+ Xe) = fxe (4) fxm- (42 Z x1) ~~ fx- (Xk Z Xk-1) 
i Xx — xa) Xk — Xk) 
apf {a4 & 1) Peers Ge al 
2 2| at, = a(t, — t) a(t, — tk—1) 
V (20a) — ti). (te — te-1) 


The independent increments property and the same sequence of steps that led to 
Eq. (9.33) can be used to show that the autocovariance of X(t) is given by 


(9.52) 


Cx(t, h) = a min(4, h) = at, fort, < h. (9.53) 


By comparing Eq. (9.53) and Eq. (9.35b), we see that the Wiener process and the Pois- 
son process have the same covariance function despite the fact that the two processes 
have very different sample functions. This underscores the fact that the mean and au- 
tocovariance functions are only partial descriptions of a random process. 


Example 9.29 


Show that the Wiener process is a Gaussian random process. 

Equation (9.52) shows that the random variables X(t), X(t) — X(t), X(t) — 
X(ty),..., X(tk) — X (tg), are independent Gaussian random variables. The random variables 
X(t) X(t), X(t),..., X(t), can be obtained from the X (tı) and the increments by a linear 
transformation: 


X(t) = X(t) + (X(t) — X(t)) + (X() — X(n)) 


X (te) = X(t) + (X(t) = X(t1)) + + (X(te) — X (tk-1)). (9.54) 


It then follows (from Eq. 6.45) that X (t4), X(t), X(t3),..., X(t) are jointly Gaussian random 
variables, and that X(t) is a Gaussian random process. 


STATIONARY RANDOM PROCESSES 


Many random processes have the property that the nature of the randomness in the 
process does not change with time. An observation of the process in the time interval 
(to, t1) exhibits the same type of random behavior as an observation in some other 
time interval (fọ + 7, tf, + 7). This leads us to postulate that the probabilities of sam- 
ples of the process do not depend on the instant when we begin taking observations, 
that is, probabilities involving samples taken at times ¢,,...,¢, will not differ from 
those taken att, + 7,...,t% 4+ 7. 


Example 9.30 Stationarity and Transience 


An urn has 6 white balls each with the label “0” and 5 white balls with the label “1”. The following 
sequence of experiments is performed: A ball is selected and the number noted; the first time a 
white ball is selected it is not put back in the urn, but otherwise balls are always put back in the urn. 
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The random process that results from this sequence of experiments clearly has a transient 
phase and a stationary phase. The transient phase consists of a string of n consecutive 1’s and it 
ends with the first occurrence of a “0”. During the transient phase P[J,, = 0] = 6/11, and the 
mean duration of the transient phase is geometrically distributed with mean 11/6. After the first 
occurrence of a “0”, the process enters a “stationary” phase where the process is a binary 
equiprobable iid sequence. The statistical behavior of the process does not change once the sta- 
tionary phase is reached. 


If we are dealing with random processes that began at t = —©o, then the above con- 
dition can be stated precisely as follows. A discrete-time or continuous-time random process 
X(t) is stationary if the joint distribution of any set of samples does not depend on the place- 
ment of the time origin. This means that the joint cdf of X(t), X(t)),..., X(t) is the 
same as that of X(t, + 7), X(b + 7),..., X(t + T): 


Preis <M X1 Xk) = ER nays cea (X15 Xk), (9.55) 


for all time shifts 7, all k, and all choices of sample times ¢,,..., tg. If a process begins 
at some definite time (i.e., = 0 or t = 0), then we say it is stationary if its joint distri- 
butions do not change under time shifts to the right. 

Two processes X(t) and Y(t) are said to be jointly stationary if the joint cdf’s of 
X(t1),--., X (tk) and Y(t;),..., Y (tj) do not depend on the placement of the time ori- 
gin for all k and j and all choices of sampling times t1,... , tg and t},..., t}. 

The first-order cdf of a stationary random process must be independent of time, 
since by Eq. (9.55), 

Fx (x) = Fry (r47)(X) = Fy(x) allt, 7. (9.56) 


This implies that the mean and variance of X(t) are constant and independent of time: 
my(t) = ELX(t)] =m for all t (9.57) 

VAR[X(t)] = E[(X(t) - m] =o? forallt. (9.58) 

The second-order cdf of a stationary random process can depend only on the time 


difference between the samples and not on the particular time of the samples, since by 
Eq. (9.55), 


Fx), X(t) (%15 X2) = Fx (oy, x-1) (X1; X2) for all t, t. (9.59) 

This implies that the autocorrelation and the autocovariance of X(t) can depend only 
onb — ty: 

Ry(t, tr) = Ry(t ga ty) for all ti, b (9.60) 

Cx(t , ty) = Cy(b <= ti) for all t, b. (9.61) 


Example 9.31 iid Random Process 


Show that the iid random process is stationary. 
The joint cdf for the samples at any k time instants, t4, . . . , tg, is 
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Fy (1)... X(t) X1 X25- Xp) = Fy (41) Fy (42). Fy (xx) 
= Fy(ty+2),...,X(y+r)(%1> AEE. Xg), 
for all k, t;,..., tg. Thus Eq. (9.55) is satisfied, and so the iid random process is stationary. 


Example 9.32 


Is the sum process a discrete-time stationary process? 
The sum process is defined by S, = X; + X,+---+ Xn, where the X; are an iid se- 
quence. The process has mean and variance 


Ms(n) = nm VAR[S,,] = no’, 


where m and o” are the mean and variance of the X,,. It can be seen that the mean and variance 
are not constant but grow linearly with the time index n. Therefore the sum process cannot be a 
stationary process. 


Example 9.33 Random Telegraph Signal 


Show that the random telegraph signal discussed in Example 9.24 is a stationary random process 
when P[.X(0) = +1] = 1/2. Show that X(t) settles into stationary behavior as t > œ even if 
P[X(0) = +1] # 1/2. 

We need to show that the following two joint pmf’s are equal: 


P[X(t,) = a1,.-., X(t) = ag] = P[X(t) + 7) = a1,..., X (tk +7) = au], 


for any k, any ti < +++ < tk, and any a; = +1. The independent increments property of the Pois- 
son process implies that 


P[ X(t) = a,..., X(t) = ap] = P[ X(t) = a] 
xX P[X(t2) = m| X(t) = ai)... P[X(tk) = ag] X (tk-1) = ax], 
since the values of the random telegraph at the times ¢;,..., 4, are determined by the number of 
occurrences of events of the Poisson process in the time intervals (tj, t;+). Similarly, 
P[X(t + 7) = a,...,X(th + 7) = ag] 
= P[X(t, + 7) = a JP[X(6 +7) = a| X(t + 7) = a]... 


x P[ X (tk T T) = ag | X (tk F T) = ay]. 
The corresponding transition probabilities in the previous two equations are equal since 


1 
zU + ealt), if a; = aj+1 
P[X(tj41) = aj | X(t) = aj] = 


1 
zU = ealt), if aj Fx aj+1 


= PUX (th + 7) = ajil X(t, +7) = aj]. 
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Thus the two joint probabilities differ only in the first term, namely, P[ X(t) = a,] and 
P[ X(t, + 7) = a]. 

From Example 9.24 we know that if P| X(0) = +1] = 1/2 then P[ X(t) = +1] = 1/2, for 
all t. Thus P[ X(t,) = a] = 1/2, PLX(t, + 7) = a] = 1/2, and 


P[X(t)) = ay,..., X(t) = ay] = P[X(t, + 7) = a4,..., X(t + 7) = ay). 


Thus we conclude that the process is stationary when P[ X(0) = +1] = 1/2. 

If P[X(0) = +1] # 1/2, then the two joint pmf’s are not equal because P[X (t4) = 
a,| # P| X(t, + T) = a]. Lets see what happens if we know that the process started at a spe- 
cific value, say X(0) = 1, that is, P[X(0) = 1] = 1. The pmf for X(t) is obtained from Eqs. 
(9.39) through (9.41): 


P[X(t) = a] = P[X(t) = a| X(0) = 1]1 


1 
zt +e°“} ifa=1 


Il 


1 
zÜ — e°} ifa = 1. 


For very small t, the probability that X(t) = 1 is close to 1; but as t increases, the probability that 
X(t) = 1 becomes 1/2. Therefore as t; becomes large, P[ X(t,) = a;] > 1/2 and P[ X(t, + 7) = 
a,|— 1/2 and the two joint pmf’s become equal. In other words, the process “forgets” the initial 
condition and settles down into “steady state,” that is, stationary behavior. 


Wide-Sense Stationary Random Processes 


In many situations we cannot determine whether a random process is stationary, but 
we can determine whether the mean is a constant: 


my(t) =m for all t, (9.62) 
and whether the autocovariance (or equivalently the autocorrelation) is a function of 
ti — b only: 

Cy(t, ty) = Cx(t ma ty) for all ty > bz. (9.63) 
A discrete-time or continuous-time random process X(t) is wide-sense stationary (WSS) 
if it satisfies Eqs. (9.62) and (9.63). Similarly, we say that the processes X(t) and Y(t) are 


jointly wide-sense stationary if they are both wide-sense stationary and if their cross- 
covariance depends only on t — t). When X(t) is wide-sense stationary, we write 


Cx(t, b) =Cx(Tt) and  Rx(t,b)= Rx(7), 
where T = h4 — h. 
All stationary random processes are wide-sense stationary since they satisfy Eqs. 


(9.62) and (9.63). The following example shows that some wide-sense stationary 
processes are not stationary. 


Example 9.34 


Let X,„ consist of two interleaved sequences of independent random variables. For n even, X, 
assumes the values +1 with probability 1/2; for n odd, X,, assumes the values 1/3 and —3 with 


522 


Chapter 9 Random Processes 


probabilities 9/10 and 1/10, respectively. X,, is not stationary since its pmf varies with n. It is easy 
to show that X,, has mean 


my(n) =0 for all n 


and covariance function 


me [X 
Cx(i, j) = i 
xij) i =1 fori = j. 
Xn is therefore wide-sense stationary. 


We will see in Chapter 10 that the autocorrelation function of wide-sense station- 
ary processes plays a crucial role in the design of linear signal processing algorithms. 
We now develop several results that enable us to deduce properties of a WSS process 
from properties of its autocorrelation function. 

First, the autocorrelation function at 7 = 0 gives the average power (second mo- 
ment) of the process: 


Rx(0) = E[X(t)?] for allt. (9.64) 
Second, the autocorrelation function is an even function of t since 
Ry(T) = EL X(t + 7)X(t)] = ELX(t) X(t + 7)] = Ry(-7). (9.65) 


Third, the autocorrelation function is a measure of the rate of change of a random 
process in the following sense. Consider the change in the process from time t to t + rT: 


P[|X(t + 7) — X(t)| > e] = P[(X(t+ 7) - X(t))’ > °] 
E[(X(t + 7) — X(t)}?] 
z 3 
_ ARKO) ~ Rul)} TA 


where we used the Markov inequality, Eq. (4.75), to obtain the upper bound. Equation 
(9.66) states that if Ry(0) — Ry(7) is small, that is, R(T) drops off slowly, then the 
probability of a large change in X(t) in T seconds is small. 
Fourth, the autocorrelation function is maximum at t = 0. We use the Cauchy- 
Schwarz inequality:” 
E[ XYP < E[X?]E[Y’], (9.67) 


for any two random variables X and Y. If we apply this equation to X(t + r) and X(t), 
we obtain 


Ry(t) = E[X(t + 7) X(t) P? = ELX7(t + 7) JELX?(t)] = Ry(0). 


Thus 
[Rx(7)| = Ry(0). (9.68) 


>See Problem 5.74 and Appendix C. 
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Fifth, if Ry(0) = Rx(d), then Ry(r) is periodic with period d and X(t) is mean 
square periodic, that is, E[(X(t + d) — X(t))*] = 0. If we apply Eq. (9.67) to 
X(t + 7+ d)— X(t + 7) and X(t), we obtain 


E[(X(t +7 +d) — X(t + 7))X(t)/? 
< E[(X(t+7+d)— X(t + T) JELX*(0)], 
which implies that 
{Rx(7 + d) — Rx(7)}? = 2{Ryx(0) — Rx(d)}Rx(0). 


Thus Ry(d) = Rx(0) implies that the right-hand side of the equation is zero, and thus 
that Ry(7 + d) = Ry(r) for all r. Repeated applications of this result imply that 
Rx(7) is periodic with period d. The fact that X(t) is mean square periodic follows from 


E[(X(t + d) — X(t))°] = 2{Rx(0) — Rx(d)} = 0. 


Sixth, let X(t) = m+ N(t), where N(t) is a zero-mean process for which 
Ry(t) > 0 as 7 —> œ, then 


Rx(7) 


E[(m + N(t + 7))(m + N(t))] = m? + 2mE[N(t)] + Ry(7) 


n? + Ry(rt) >m? as T — 00. 


In other words, Ry(7) approaches the square of the mean of X(t) as T > co. 

In summary, the autocorrelation function can have three types of components: 
(1) a component that approaches zero as T — 00; (2) a periodic component; and (3) a 
component due to a nonzero mean. 


Example 9.35 


Figure 9.13 shows several typical autocorrelation functions. Figure 9.13(a) shows the autocorre- 
lation function for the random telegraph signal X(t) (see Eq. (9.44)): 


Rx(r) = ehl for all 7. 


X(t) is zero mean and Ry(r) > 0 as |r| > oo. 
Figure 9.13(b) shows the autocorrelation function for a sinusoid Y(t) with amplitude a and 
random phase (see Example 9.10): 
2 
Ry(T) = 5 cos(2m for) for all 7. 


Y(t) is zero mean and Ry(7) is periodic with period 1/fọ. 

Figure 9.13(c) shows the autocorrelation function for the process Z(t) = X(t) + Y(t) + m, 
where X(t) is the random telegraph process, Y(t) is a sinusoid with random phase, and m is a con- 
stant. If we assume that X(t) and Y(t) are independent processes, then 


Rz(1) = EL {X(t + 7) + Y(t + 7) + m}{X(t) + Y(t) + md] 
= Ry(r) + Ry(t) + m. 
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=T 
a? 
Ry(t) = z cos 2TfoT 
> T 
R(t) 
m2 

> T 

0 

(c) 


FIGURE 9.13 

(a) Autocorrelation function of a random telegraph signal. (b) Autocorrelation 
function of a sinusoid with random phase. (c) Autocorrelation function of a random 
process that has nonzero mean, a periodic component, and a “random” component. 


Wide-Sense Stationary Gaussian Random Processes 


If a Gaussian random process is wide-sense stationary, then it is also stationary. Recall 
from Section 9.5, Eq. (9.47), that the joint pdf of a Gaussian random process is com- 
pletely determined by the mean my(t) and the autocovariance Cy(t,, t2). If X(t) is 
wide-sense stationary, then its mean is a constant m and its autocovariance depends 
only on the difference of the sampling times, t; — t;. It then follows that the joint pdf of 
X(t) depends only on this set of differences, and hence it is invariant with respect to 
time shifts. Thus the process is also stationary. 

The above result makes WSS Gaussian random processes particularly easy to work 
with since all the information required to specify the joint pdf is contained in m and Cy(7). 


Example 9.36 A Gaussian Moving Average Process 


Let X, be an iid sequence of Gaussian random variables with zero mean and variance o”, and let 
Y,, be the average of two consecutive values of X;,,: 


9.6.3 
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Xn + Xn 
n5 ya 


The mean of Y, is zero since E[ X;] = 0 for all i. The covariance is 


Cli, j) = EDY) = GEL + Xj + XH] 


1 
z q EIX] + E[X:X;-1] + E[Xi-1X;] + E[Xi-1X;-1]} 
1 
z” ifi= j 
1 
I” if|i- jl=1 
0 otherwise. 


We see that Y,, has a constant mean and a covariance function that depends only on li — jl, thus 
Y,, is a wide-sense stationary process. Y,, is a Gaussian random variable since it is defined by a 
linear function of Gaussian random variables (see Section 6.4, Eq. 6.45). Thus the joint pdf of Y, 
is given by Eq. (9.47) with zero-mean vector and with entries of the covariance matrix specified 
by Cy(i, j) above. 


Cyclostationary Random Processes 


Many random processes arise from the repetition of a given procedure every T seconds. 
For example, a data modulator (“modem”) produces a waveform every T seconds ac- 
cording to some input data sequence. In another example, a “time multiplexer” inter- 
leaves n separate sequences of information symbols into a single sequence of symbols. It 
should not be surprising that the periodic nature of such processes is evident in their prob- 
abilistic descriptions. A discrete-time or continuous-time random process X(f) is said to be 
cyclostationary if the joint cumulative distribution function of any set of samples is invari- 
ant with respect to shifts of the origin by integer multiples of some period T. In other words, 
X(t), X(h),..., X(t) and X(t, + mT), X(t + mT),..., X(t, + mT) have the 


same joint cdf for all k, m, and all choices of sampling times f,..., tg: 
PGA. Ko RUD as X25- -+> Xk) 
= Fy (t,+mT),X(h+m1),..., X(tt¢mT)(%1; X25.. Xk). (9.69) 


We say that X(t) is wide-sense cyclostationary if the mean and autocovariance func- 
tions are invariant with respect to shifts in the time origin by integer multiples of T, 
that is, for every integer m, 


my(t + mT) = mx(t) (9.70a) 


Note that if X(t) is cyclostationary, then it follows that X(t) is also wide-sense cyclosta- 
tionary. 
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Example 9.37 


Consider a random amplitude sinusoid with period T: 
X(t) = Acos(27t/T). 


Is X(t) cyclostationary? wide-sense cyclostationary? 
Consider the joint cdf for the time samples t,,..., tg: 


P[X(t)) S x1, X(t) S xX2,..., X (tk) = Xx) ] 
= P| Acos(27t/T) = x,,..., Acos(27t,/T) = xx] 
= P| A cos(2mr(t + mT)/T) = x,,..., Acos(2a7(t, + mT)/T) = x] 
= P[X(t + mT) = x1, X (b + mT) S x%,...,X( + mT) = x4]. 


Thus X(t) is a cyclostationary random process and hence also a wide-sense cyclostationary 
process. 


In the above example, the sample functions of the random process are always pe- 
riodic. The following example shows that, in general, the sample functions of a cyclo- 
stationary random process need not be periodic. 


Example 9.38 Pulse Amplitude Modulation 


A modem transmits a binary iid equiprobable data sequence as follows: To transmit a binary 1, 
the modem transmits a rectangular pulse of duration T seconds and amplitude 1; to transmit a bi- 
nary 0, it transmits a rectangular pulse of duration T seconds and amplitude —1. Let X(t) be the 
random process that results. Is X(t) wide-sense cyclostationary? 

Figure 9.14(a) shows a rectangular pulse of duration T seconds, and Fig. 9.14(b) shows the 
waveform that results for a particular data sequence. Let A; be the sequence of amplitudes (+1) 


p(t) [| 
>t 
| dt 


=1 -1 


(b) Waveform corresponding to data sequence 1001 


FIGURE 9.14 
Pulse amplitude modulation. 
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corresponding to the binary sequence, then X(t) can be represented as the sum of amplitude- 
modulated time-shifted rectangular pulses: 


X(t) = 5 A,p(t — nT). (9.71) 


n=—00 


The mean of X(t) is 
mx(t) = e] È Apt- "r| = > ALA,|p(e ~ nT) = 0 


since E[A,,] = 0. The autocovariance function is 


Cx(t, h) = ELX(4)X(%)] -0 


_ JE[X(4)]=1 ifnT < t,t < (n + 1)T 
— | EL X(t) JELX(b)] = 0 otherwise. 


Figure 9.15 shows the autocovariance function in terms of f and h. It is clear that 
Cy(t, + mT, t + mT) = Cyx(t, t) for all integers m. Therefore the process is wide-sense cy- 
clostationary. 


We will now show how a stationary random process can be obtained from a cyclo- 
stationary process. Let X(t) be a cyclostationary process with period T. We “stationarize” 
X(t) by observing a randomly phase-shifted version of X(t): 


X,(t) = X(t + O) © uniform in [0, T], (9.72) 


0 T 2T 3T 4T 5T 


FIGURE 9.15 
Autocovariance function of pulse amplitude-modulated 
random process. 
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where © is independent of X(t). X,(t) can arise when the phase of X(t) is either un- 
known or not of interest. If X(t) is a cyclostationary random process, then X,(t) is a sta- 
tionary random process. To show this, we first use conditional expectation to find the 
joint cdf of X,(t): 


P[Xs(4) 5 x1, Xb) 5 x2,..., Xs(te) = XK] 
= P| X(t + O) s x1, X(t + 0) < X2,- -, X (tk + 0) < xg] 


T 
I PI X(t + ©) < x1,..., X (tk + @) S x,|O = 0] fo(0) do 
0 


T 
a P[X(t + 0) S x1,..., X(tk + 0) = x] dO. (9.73) 
0 

Equation (9.73) shows that the joint cdf of X,(t) is obtained by integrating the joint cdf 

of X(t) over one time period. It is easy to then show that a time-shifted version of X,(t), 

say X,(t, + 7), X(t + 7),..., X(t + 7), will have the same joint cdf as X,(t,), 

X;(t2),..-, X(t.) (see Problem 9.80). Therefore X,(t) is a stationary random process. 
By using conditional expectation (see Problem 9.81), it is easy to show that if X(t) 

is a wide-sense cyclostationary random process, then X,(t) is a wide-sense stationary 

random process, with mean and autocorrelation given by 


T 
E| X,(t)] = a m,(t) dt (9.74a) 
T 
Rx (7) = a Ry(t + 7,t)dt. (9.74b) 
0 


Example 9.39 Pulse Amplitude Modulation with Random Phase Shift 


Let X,(t) be the phase-shifted version of the pulse amplitude—modulated waveform X(t) intro- 
duced in Example 9.38. Find the mean and autocorrelation function of X,(t). 

X,(t) has zero mean since X(t) is zero-mean. The autocorrelation of X,(t) is obtained 
from Eq. (9.74b). From Fig. 9.15, we can see that for 0 <t+7<T,Ry(t+7,t) =1 and 
Rx(t + 7,t) = 0 otherwise. Therefore: 


1 Fer TEZ 
for0 <r <T: Relr) = 7 dt = Z, 
0 


9.7 
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CONTINUITY, DERIVATIVES, AND INTEGRALS OF RANDOM PROCESSES 


Many of the systems encountered in electrical engineering have dynamics that are de- 
scribed by linear differential equations. When the input signals to these systems are de- 
terministic, the solutions of the differential equations give the output signals of the 
systems. In developing these solutions we make use of the results of calculus for deter- 
ministic functions. Since each sample function of a random process can be viewed as a 
deterministic signal, it is only natural to apply continuous-time random processes as 
input signals to the above systems. The output of the systems then consists of a sample 
function of another random process. On the other hand, if we view a system as acting 
on an input random process to produce an output random process, we find that we 
need to develop a new “calculus” for continuous-time random processes. In particular 
we need to develop probabilistic methods for addressing the continuity, differentiabili- 
ty, and integrability of random processes, that is, of the ensemble of sample functions as 
a whole. In this section we develop these concepts. 


Mean Square Continuity 


A natural way of viewing a random process is to imagine that each point ¢ in S produces 
a particular deterministic sample function X (t, £). The standard methods from calculus 
can be used to determine the continuity of the sample function at a point tọ for each point 
¢. Intuitively, we say that X (t, £) is continuous at tọ if the difference | X (t, £) — X (to, £) | 
approaches zero as t approaches tọ. More formally, we say that: 


X(t, ¢) is continuous at tọ if given any £ > 0 there exists a ô > 0 such that 
|t — tol < ô implies that | X(t, £) — X(t, 2)| < £, and we write: 


lim X(t, £) = X(to,2). 
>to 

In some simple cases, such as the random sinusoid discussed in Example 9.2, we can es- 
tablish that all sample functions of the random process are continuous at a point tọ, and 
so we can conclude that the random process is continuous at tọ. In general, however, we 
can only address the continuity of a random process in a probabilistic sense. In this sec- 
tion, we concentrate on convergence in the mean square sense, introduced in Section 
7.4, because of its tractability and its usefulness in the study of linear systems subject to 
random inputs. 


Mean Square Continuity: The random process X(t) is continuous at the 
point tọ in the mean square sense if 


E[(X(t) — X(t)]>0 ast>t. (9.75) 
We denote mean square continuity by (limit in the mean) 


Lim. X(t) = X(t). 


t—>to 


We say that X(t) is mean square continuous if it is mean square continuous for all tọ. 
Note that if all sample functions of a random process are continuous at a point tọ, then 
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to 0 
(a) (b) 
FIGURE 9.16 


(a) Mean square continuity at tọ does not imply all sample functions are continuous at to. (b) If X(t) is WSS 
and R,(7) is continuous at r = O, then X(t) is mean square continuous for all t. 


the process will also be mean square continuous at the point tọ. In the examples we will 
see that mean square continuity does not imply that all the sample functions are con- 
tinuous. Thus, in general, we may have the situation in Fig. 9.16. 

In order to determine what conditions are required for mean square continuity, 
consider the mean square difference between X(t) and X (tọ): 


E[(X(t) — X(to))"] = Rx(t,t) — Rx(to, £) 
— Ryx(t, to) + Rx(to, to). (9.76) 


Hence, if the autocorrelation function Ry(t,, t2) is continuous at the point (fo, tọ), then 
letting t — tọ, the right-hand side of Eq. (9.76) will vanish. Thus we conclude that if 
Ry(t,, t) is continuous in both t, and ty at the point (tọ, to), then X(t) is mean square 
continuous at the point to. 

At this point it is worth recalling that a function of two variables f (x, y) is continu- 
ous at a point (a, b) if the limit f (x, y) reaches the same value for any mode of approach 
from (x, y) to (a, b). In particular, in order for Ry(t,,f) to be continuous at 
(to, to), Rx(t,, 2) must approach the same value as ft, and t, approach (fp, to) from any 
direction. 

A discontinuity in the mean function my(t) at some point tọ indicates that the 
sample functions must be discontinuous at tọ with nonzero probability. Therefore, we 
must have that if X(t) is mean square continuous at to, then the mean function mx(t) 
must be continuous at to: 

lim my(t) = my(to). (9.77a) 
>to 
To show this, we note that the variance of the difference X(t) — X (tọ) is nonnegative, 
thus 


0 = VAR[X(t) — X(t)] = E[(X(t) — X(to))?] 
— E[X(t) — X(t) f. 


Section 9.7 Continuity, Derivatives, and Integrals of Random Processes 531 


Therefore 
ELX (t) — X(to))"] = EL X(t) — X(t)? = (mx(t) — myx(t))?- 


If X(t) is mean square continuous at tọ, then as t — tọ the left-hand side of the above 
equation approaches zero. This implies that the right-hand side approaches zero, and 
hence my(t) > my(ty). Equation (9.77a) can be rewritten as follows: 


lim E[ X(t)] = E|Lim. X(t) |. (9.77) 


t—>to t>to 


Therefore if X(t) is mean square continuous at fy, then we can interchange the order of 
the limit and the expected value. 
If the random process X(t) is wide-sense stationary, then Eq. (9.76) becomes 


E[(X(t + 7) — X(t))"] = 2(Rx(0) — Rx(7)). (9.78) 


Therefore if Ry(T) is continuous at t = 0, then the wide-sense stationary random 
process X(t) is mean square continuous at every point ty. 


Example 9.40 Wiener and Poisson Processes 


Are the Wiener and Poisson processes mean square continuous? 
The autocorrelation of the Wiener process X(f) is given by 


Ry(t,%) = amin(t,, b). 
Consider the limit as 4 and t, approach (tọ, fg): 
IRx(to + £1, to + €2) — Rx(to, to)l 
=a |min(t) + £1, fo + £2) — t| S a max(e, £2). 


As &; and s, approach zero, the above difference vanishes. Therefore the autocorrelation func- 
tion is continuous at the point (tọ, tọ), and the Wiener process is mean square continuous. 
The autocorrelation of the Poisson process N(¢) is given by 


Ry(4, h) = Amin(t, t). 


This is exactly the same as that of the Wiener process. Therefore the Poisson process is also mean 
square continuous. 


The above example shows clearly how mean square continuity does not imply 
continuity of the sample functions. The Poisson and Wiener processes have the same 
autocorrelation function and are both mean square continuous. However, the Poisson 
process has a countably infinite number of discontinuities, while it can be shown that 
almost all sample functions of the Wiener process are continuous. 


Example 9.41 Pulse Amplitude Modulation 


Let X(t) be the pulse amplitude modulation random process introduced in Example 9.38. Is X(t) 
mean square continuous? 
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The process has discontinuity at t = mT with nonzero probability, so we expect the 
process not to be mean square continuous. The autocorrelation function of X(t) is shown in Fig. 
9.15 and is given by 


1 nT St, < (n+1)TandnT St, < (n+ 1)T 


Rx(t,6) = F 


otherwise. 


The autocorrelation function is continuous at all points t; = tọ # nT, and hence X(t) is mean 
square continuous at all points within the signaling intervals, nT < t < (n + 1)T. However, 
the autocorrelation function is not continuous at the points t = tọ = nT, which correspond to 
the points where the transitions between pulses occur. For example, if we approach the point 
(nT, nT) along the line ti = fh, we obtain the limit 1; if we approach (nT, nT) along a line per- 
pendicular to the above, the limit is zero. Thus X(t) is not mean square continuous at the point 
t=nT. 


9.7.2 Mean Square Derivatives 


Suppose we take a sample function of a random process X(t, ¢) and carry out the lim- 
iting procedure that defines the derivative of a deterministic function: 


AE Ped) ~ X(t, ¢) 
lim : 
e—>0 E 


This limit may exist for some sample functions and it may fail to exist for other sample 
functions of the same random process. We define the derivative of a random process in 
terms of mean square convergence: 


Mean Square Derivative: The random process X(t) has mean square deriva- 
tive X'(t) at t defined by 


i ae (9.79) 


e—>0 € 


provided that the mean square limit exists, that is, 


lim a (= POTA xo) | =0. (9.80) 


e—>0 € 


We also denote the mean square derivative by dX(t)/dt. Note that if all sample func- 
tions of X(t) are differentiable at the point t, then the mean square derivative exists be- 
cause Eq. (9.80) is satisfied. However, the existence of the mean square derivative does 
not imply the existence of the derivative for all sample functions. 

It can be shown that the mean square derivative of X(t) at the point t exists if 


32 
at Ot 


Ryx(t, t2) 
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exists at the point (t, t2) = (t, t). We examine the special case where X(t) is WSS. Con- 
sider the mean square value of the first difference in X(t): 


ef (* + i = xoy] ioe! (Rx(0) — Rx(h) — Ry(—h) + Ry(0)) 


h 
1 | Rx(h)— Rx(0) Rx(0) — Rx(=h) 
h h 


2 
>)- 5 Rx(7) 


J2Rx (9.81) 


T=0 


Therefore the mean square derivative of a WSS random process X(t) exists for all t if 
Rx(7) has derivatives up to order two at tT = 0. 

If X(t) is a Gaussian random process for which the mean square derivative 
X'(t) exists, then X'(t) must also be a Gaussian random process. To show this, con- 
sider Y,(t) = (X(t + e) — X(t))/e. The k time samples Y,(t,), Y,(t2),..., Y,(t) are 
given by a linear transformation of the jointly Gaussian random variables X(t, + £), 
X(t), X(t + £), X(b),..., X(t, + ©), X(t). It then follows that Y,(t,), Y,(t),..., 
Y,(t,) are jointly Gaussian random variables and hence that Y,(t) is a Gaussian ran- 
dom process. X’(t), the limit of Y,(¢) as e approaches zero, is then also a Gaussian ran- 
dom process since (from Section 7.4) mean square convergence implies convergence in 
distribution. 

Once we have determined the existence of the mean square derivative X’(t), we 
can proceed to find its mean and autocorrelation functions. Using the same reasoning 
that led to Eq. (9.77b), we can show that we can interchange the order of expectation 
and mean square differentiation. Therefore 


X(t +e) - zw] 


E[X'(t)] = Etim. 


e>0 E€ 
KA By RG 
= lim A ees | 
e>0 € 
my(t + —my(t 
a Ne a, (9.82) 
e>0 € dt 


Note that if X(t) is a wide-sense stationary process, then my(t) = m, a constant, and 
therefore E[X'(t)] = 0. 
Next we find the cross-correlation between X(t) and X’ (t): 


Xb +e)- xw 


E 


Ry x(t, b) = a| x Liam. 


_ Rx(t, + €) — Rx(t, b) 
a 


ð 
= —Rx(h, b). 
ðb x(t 2) 
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Finally, we obtain the autocorrelation of X’ (t): 


RGR) = a| imd =“ udaa LO hera] 


e—>0 E€ 


Ry x(t + & bh) — Ry x(t, tb) 


= lim 
e>0 E€ 
ee ee oi (9.83) 
an XxX Ni 2 an an Xt? 2) : 
If X(t) is a wide-sense stationary process, we have 
ð d 
Ry x(T) = a Rx(t -b)= F Rx(7), (9.84) 
2 T 


where T = ti — b, and then 


= ——>Ry(r). (9.85) 


Example 9.42 


Let X(t) be the random amplitude sinusoid introduced in Example 9.9. Does X(t) have a mean 
square derivative? 
The autocorrelation function for X(t) is 


Ry(t),b) = E[A’] cos 27t cos 27h. 


The second mixed partial derivative with respect to t and h exists at every point (t, t), and is given 


by 
2 


an any XM h) | <ner = 40° EĻ A] sin? 27t. 


Therefore X(t) has a mean square derivative at every point t. 


Example 9.43 Wiener Process and White Gaussian Noise 


Does the Wiener process have a mean square derivative? 

Recall that the Wiener process is Gaussian, so we expect that its derivative is also Gauss- 
ian. We first show that this process does not have a mean square derivative. The Wiener process 
has autocorrelation function given by 


X at b < ty 
Ry(t,,6) = a min(4, b) = 
x(t, ty) (t, b) 2 b= t. 


The first derivative with respect to f is 


9.7.3 
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The derivative of a step function does not exist at its point of discontinuity. We therefore con- 
clude that the second mixed partial derivative does not exist at any point t, and hence the Wiener 
process does not have a mean square derivative at any point. 

We can generalize the notion of derivative of a random process if we use delta functions. 
Recall that the delta function is defined so that its integral is a unit step function (see Eq. 4.18). 
We can therefore interpret the derivative of a unit step function as yielding a delta function. This 
suggests that the process X'(t) has autocorrelation function given by 


Rx(t,, tr) = = au(ty b) = ad(ty tn). (9.86) 
The properties of the delta function give the random process X'(t) some unusual properties. 
First, since the delta function is infinite at t; = h, it follows that the mean square value of X’ (t) 
is infinite, that is, X’ (t) has infinite power. Also, since the delta function is zero whenever t # b, 
it follows that any two distinct time samples, X'(t,) and X'(f), are uncorrelated regardless of 
how close t; is to t. This suggests that X’(t) varies extremely rapidly in time. Recall that the 
Wiener process was obtained in Section 9.5 as the limit of the random walk process. Thus it is not 
surprising that the derivative of the process has these properties. 
The random process that results from taking the derivative of the Wiener process is called 
white Gaussian noise. It is very useful in modeling broadband noise in communication and radar 
systems. We discuss it further in the next chapter. 


Mean Square Integrals 


The integral of a continuous-time random process arises naturally when computing 
time averages. It also arises as the solution to systems described by linear differential 
equations. In this section, we develop the notion of the integral of a random process in 
the sense of mean square convergence. 

Suppose we are interested in the integral of the random process X(t) over the in- 
terval (to, t). We partition the interval into n subintervals and form the sum 


n7 ZX Ax 
=1 


We define the integral of X(t) as the mean square limit of the sequence 7, as the width 
of the subintervals approaches zero. When the limit exists, we denote the limiting ran- 
dom process by 


t 
= [xo dt’ = l.im. 2x (t) A (9.87) 
to 


A,-0 


The Cauchy criterion provides us with conditions that ensure the existence of the 
mean square integral in Eq. (9.87): 


a {3x0 Aj - > X(t) a} | 0 as Aj, A, > 0. (9.88) 
j k 


As in the case of the mean square derivative, we obtain three terms when we expand the 
square inside the expected value. Each of these terms leads to an expression of the form 


H SEWN X(t) Aj (Be = > D Rxli; , tk) Aj Ax. (9.89) 
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If the limit of the expression on the right-hand side exists, then it can be shown that the 
three terms resulting from Eq. (9.88) add to zero. On the other hand, the limit of the 
right-hand side of Eq. (9.89) approaches a double integral of the autocorrelation func- 
tion. We have thus shown that the mean square integral of X(t) exists if the following 


double integral exists: 
t t 
f [ext v) du dv. (9.90) 
to Sto 


It can be shown that if X(t) is a mean square continuous random process, then its integral 
exists. 

If X(t) is a Gaussian random process, then its integral Y(t) is also a Gaussian ran- 
dom process. This follows from the fact that the /,,’s are linear combinations of jointly 
Gaussian random variables. 

The mean and autocorrelation function of Y(t) are given by 


my(t) = A [xe ar | = fae dt' 


= f mxw dt' (9.91) 


Ry(t,b) = e| f xo du f xo) a | 


t rh 
= i | Ry(u, v) du dv. (9.92) 
to vto 


Finally, we note that if X(t) is wide-sense stationary, then the integrands in Eqs. 
(9.90) and (9.92) are replaced by Ry(u — v). 


and 


Example 9.44 Moving Average of X(t) 


Find the mean and variance of M(t), the moving average over half a period of a random ampli- 
tude sinusoid X(t) with period T: 


The mean of M(t) is given by 


t 


2 Qn’, Èa 
E[M(t)] = Zf  ElA]cos T dt' = E[A}~ sin 


Its second moment at time t is given by 


E[M2(t)] = Ry(t,t) Ji a Bairen a ee a a 
S n Ey cos COS u av = sın k 
m T? Jire Jiri T T =) T 


9.7.4 
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The variance is then 


4. ,2mt 4. ,2nt 
VAR[M(t)] = E[A] sit Z — EL AP—sin? = 
T T T T 

4 ,2nt 

= VAR[A] sin’. 


Example 9.45 Integral of White Gaussian Noise 


Let Z(t) be the white Gaussian noise process introduced in Example 9.43. Find the autocorrela- 
tion function of X(t), the integral of Z(t) over the interval (0, £). 
From Example 9.43, the white Gaussian noise process has autocorrelation function 


Rz(t, h) = a(t, — b). 


The autocorrelation function of X(t) is then given by 


t ph t2 
Rx(t, b) = if f aô(w — v) dw dv = af u(t; — v) dv 
o Jo 0 
min(t,,t2) 
= a | dv = amin(t,, b). 
0 


We thus find that X(t) has the same autocorrelation as the Wiener process. In addition we have 
that X(t) must be a Gaussian random process since Z(t) is Gaussian. It then follows that X(t) 
must be the Wiener process because it has the joint pdf given by Eq. (9.52). 


Response of a Linear System to Random Input 


We now apply the results developed in this section to develop the solution of a linear 
system described by a first-order differential equation. The method can be generalized 
to higher-order equations. In the next chapter we develop transform methods to solve 
the general problem. 

Consider a linear system described by the first-order differential equation: 


X'(t) + aX(t) = Z(t) t= 0, X(0) = 0. (9.93) 


For example, X(t) may represent the voltage across the capacitor of an RC circuit with 
current input Z(t). We now show how to obtain my(t) and Rx(t,, t). If the input 
process Z(t) is Gaussian, then the output process will also be Gaussian. Therefore, in 
the case of Gaussian input processes, we can then characterize the joint pdf of the out- 
put process. 


538 


Chapter 9 Random Processes 


We obtain a differential equation for m y(t) by taking the expected value of 
Eq. (9.93): 
E| X'(t)] + ELX(t)] = my(t) + my(t) = mz(t) t=0 (9.94) 


with initial condition my(0) = E[X(0)] = 0. 
As an intermediate step we next find a differential equation for Rz y(t, t2). If we 
multiply Eq. (9.93) by Z(t) and take the expected value, we obtain 


EL Z(t) X'(h)] + aE Z(t) X(h)] = ELZ(4)Z(h)] bh =0 


with initial condition E[Z(t,)X(0)] = 0 since X(0) = 0. The same derivation that led 
to the cross-correlation between X(t) and X’(t) (see Eq. 9.83) can be used to show that 


: 0 
E[Z(t)X"(th)] = an, Rex, ty). 
Thus we obtain the following differential equation: 


0 
ay Rex b) + aRz x(t, b) = Rz(t,b) nh 20 (9.95) 
with initial condition Rz x(t, 0) = 0. 

Finally we obtain a differential equation for Rz(t,, t2). Multiply Eq. (9.93) by 
X(t) and take the expected value: 


E|X'(t)X(b)] + a@E[ X(t) X(b)] = E[Z(4)X(o)] 4=0 


with initial condition E[ X(0)X(t)] = 0. This leads to the differential equation 


0 

an, exh fe) + aRx(t, b) = Rzx(t.b) 4 20 (9.96) 
with initial condition Rz x(0, ft) = 0. Note that the solution to Eq. (9.95) appears as 
the forcing function in Eq. (9.96). Thus we conclude that by solving the differential 
equations in Eqs. (9.94), (9.95), and (9.96) we obtain the mean and autocorrelation 
function for X(t). 


Example 9.46 Ornstein-Uhlenbeck Process 


Equation (9.93) with the input given by a zero-mean, white Gaussian noise process is called the 
Langevin equation, after the scientist who formulated it in 1908 to describe the Brownian motion 
of a free particle. In this formulation X(t) represents the velocity of the particle, so that Eq. (9.93) 
results from equating the acceleration of the particle X'(t) to the force on the particle due to 
friction —aX(t) and the force due to random collisions Z(t). We present the solution developed 
by Uhlenbeck and Ornstein in 1930. 

First, we note that since the input process Z(t) is Gaussian, the output process X(t) will 
also be a Gaussian random process. Next we recall that the first-order differential equation 


x'(t) + ax(t) = g(t) t = 0, x(0) = 0 
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has solution 


t 
x(t) = i ee(r)dr t= 0. 
0 
Therefore the solution to Eq. (9.94) is 


The autocorrelation of the white Gaussian noise process is 


Rz(t, b) = 078(t, — b). 


Equation (9.95) is also a first-order differential equation, and it has solution 
ty 
Rz x(h, b) = Vi eI) Rz(t, 7) dr 
0 


h 
= i elo S(t — T)dr 
0 


z gelt) ah 


where u(x) is the unit step function. 


The autocorrelation function of the output process X(t) is the solution to the first-order 
differential equation Eq. (9.96). The solution is given by 


ty 
Rx(t, b) = if e “Ry x(7, b) dt 


ti 
= a eelu, — T) dr 
0 


min(t;,t2) 
= o | eT) eal) dr 
0 


2 
= L (eimo — eanit) 4 FO = 0. (9.97a) 
2a 
A Gaussian random process with this autocorrelation function is called an Ornstein-Uhlen- 
beck process. Thus we conclude that the output process X(t) is an Ornstein-Uhlenbeck 
process. 


If we let 4 = ¢ and f = t + 7, then as t approaches infinity, 


o 
Ry(t + 7,t)> sae (9.97b) 


a 


This shows that the effect of the zero initial condition dies out as time progresses, and the process 


becomes wide-sense stationary. Since the process is Gaussian, this also implies that the process 
becomes strict-sense stationary. 
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TIME AVERAGES OF RANDOM PROCESSES AND ERGODIC THEOREMS 


At some point, the parameters of a random process must be obtained through mea- 
surement. The results from Chapter 7 and the statistical methods of Chapter 8 suggest 
that we repeat the random experiment that gives rise to the random process a large 
number of times and take the arithmetic average of the quantities of interest. For ex- 
ample, to estimate the mean m y(t) of a random process X(t, £), we repeat the random 
experiment and take the following average: 


N 
Mx(t) = > X (i565), (9.98) 


where N is the number of repetitions of the experiment, and X(t, ¢;) is the realization 
observed in the ith repetition. 

In some situations, we are interested in estimating the mean or autocorrelation 
functions from the time average of a single realization, that is, 


T 
(X(t))r = az | Xt ) dt. (9.99) 


An ergodic theorem states conditions under which a time average converges as the ob- 
servation interval becomes large. In this section, we are interested in ergodic theorems 
that state when time averages converge to the ensemble average (expected value). 

The strong law of large numbers, presented in Chapter 7, is one of the most im- 
portant ergodic theorems. It states that if X,, is an iid discrete-time random process 
with finite mean E/X,,] = m, then the time average of the samples converges to the 
ensemble average with probability one: 


12 
e| lim -X X; = m =1. (9.100) 
n>% Nn; 


This result allows us to estimate m by taking the time average of a single realization of 
the process. We are interested in obtaining results of this type for a larger class of ran- 
dom processes, that is, for non-iid, discrete-time random processes, and for continuous- 
time random processes. 

The following example shows that, in general, time averages do not converge to 
ensemble averages. 


Example 9.47 


Let X(t) = A for all t, where A is a zero-mean, unit-variance random variable. Find the limiting 
value of the time average. 
The mean of the process is my(t) = E[X(t)] = E[A] = 0. However, Eq. (9.99) gives 


T 
(X(t))7 = ofA dt = A. 


Thus the time-average mean does not always converge to my(t) = 0. Note that this process is 
stationary. Thus this example shows that stationary processes need not be ergodic. 
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Consider the estimate given by Eq. (9.99) for E[ X(t)] = m y(t). The estimate 
yields a single number, so obviously it only makes sense to consider processes for 
which my(t) = m, a constant. We now develop an ergodic theorem for the time aver- 
age of wide-sense stationary processes. 


Let X(t) be a WSS process. The expected value of (X(t))+7 is 


F 
ENX(t))r] TRIES a|- rae E[X(t)]dt=m. — (9.101) 


Equation (9.101) states that (X(t))7 is an unbiased estimator for m. 
Consider the variance of (X (t))r: 


VARI(X(1))7] = EL(X(¢)) 


~ 4p? a. 


Cy(t, t’) dt dt’. 9.102 
-af few x( (9.102) 
Since the process X(t) is WSS, Eq. (9.102) becomes 


STA 
z 
| 
= 
S 
= 


VAR[(X(t))r] -Af ie Cy(t — t') dt dt’. (9.103) 


Figure 9.17 shows the region of integration for this integral. The integrand is constant 
along the line u = t — t' for —2T < u < 2T, so we can evaluate the integral as the 


70=t-? 
/ 


FIGURE 9.17 
Region of integration for integral in Eq. (9.102). 
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sums of infinitesimal strips as shown in the figure. It can be shown that each strip has area 
(2T — |u|) du, so the contribution of each strip to the integral is (27 — |u|)Cy(u) du. 
Thus 


2T 
VAR[(X(0))r] = 1 QT — |ul)Cy(u) du 


AT? Jor 
au. V = zc d 9.104 
= or | op zr JCx(u) du. (9.104) 


Therefore, (X(t))7 will approach m in the mean square sense, that is, E| ((X(t)}r — 
m)’|— 0, if the expression in Eq. (9.104) approaches zero with increasing T. We have 
just proved the following ergodic theorem. 


Theorem 
Let X(t) be a WSS process with my(t) = m, then 

Jim (X(t))r =m 
in the mean square sense, if and only if 


2T 
EN (1 - Heraa =0 
TZT Jor i ih eae 


In keeping with engineering usage, we say that a WSS process is mean ergodic if it sat- 
isfies the conditions of the above theorem. 

The above theorem can be used to obtain ergodic theorems for the time average 
of other quantities. For example, if we replace X(t) with Y(t + 7)Y(t) in Eq. (9.99), we 
obtain a time-average estimate for the autocorrelation function of the process Y(t): 

1 fT 
(Y(t + 7r)Y(t))7r = Tah Y(t + 7)Y(t) dt. (9.105) 

-T 
It is easily shown that E| (Y(t + 7)Y(t))7] = Ry(rT) if Y(t) is WSS. The above ergodic the- 
orem then implies that the time-average autocorrelation converges to Ry(7T) in the mean 
square sense if the term in Eq. (9.104) with X(f) replaced by Y(t) Y(t + T) converges to zero. 


Example 9.48 


Is the random telegraph process mean ergodic? 
The covariance function for the random telegraph process is Cy(7) = e 24, so the vari- 


ance of (X(t))r is x 
VAR[(X(t))r] = cal (1 At du 


2T —4aT 

1 =e 

<= e 2 dy = ——— 
T 0 2aT 


The bound approaches zero as T — œ, so VAR[(X(t))7] —> 0. Therefore the process is mean 
ergodic. 
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If the random process under consideration is discrete-time, then the time-average 
estimate for the mean and the autocorrelation functions of X, are given by 


1 T 


Xnr == X, .106 
(ile = ae Xe (9.106) 

1 T 
(Xn+kXnr = OT + 1,2, kn tkXn- (9.107) 


If X, is a WSS random process, then E| (X,,)7] = m, and so (X,,)r is an unbiased esti- 
mate for m. It is easy to show that the variance of (X,,)7 is 


2T k 
VAR[(X,)7] = va > - ag ext. (9.108) 


Therefore, (X,,)7 approaches m in the mean square sense and is mean ergodic if the ex- 
pression in Eq. (9.108) approaches zero with increasing T. 


Example 9.49 Ergodicity and Exponential Correlation 


Let X,, be a wide-sense stationary discrete-time process with mean m and covariance function 
Cy(k) = op *, for |p| < 1 and k = 0, +1, +2,.... Show that X,, is mean ergodic. 
The variance of the sample mean (Eq. 9.106) is: 


VAR[(X,,)7 = l > (1 Ixl Je 
OS OP | et 2T +1 

-2 <` ak = 20 1 

2T +16 2T +11-p 


The bound on the right-hand side approaches zero as T increases and so X,, is mean ergodic. 


Example 9.50 Ergodicity of Self-Similar Process and Long-Range Dependence 


Let X,„ be a wide-sense stationary discrete-time process with mean m and covariance function 


2 
Cx(k) = - {lk + 17 = 2|k?¥ + |k -= 1/244 (9.109) 


for 1/2 < H < 1andk = 0, +1, +2,... X, is said to be second-order self-similar. We will inves- 
tigate the ergodicity of X,,. 
We rewrite the variance of the sample mean in (Eq. 9.106) as follows: 


1 2T 


VAR[(X,)r] = OF + peste +1- |k|)Cy(k) 
= OF pe + 1)Cy(0) + 2(2TCy(1)) + ... + 2Cy(2T)}. 
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It is easy to show (See Problem 9.132) that the sum inside the braces is o7(2T + 1)?”. Therefore 
the variance becomes: 


1 
(2T +17 


VAR[(X,,)r] = P (2T +1)" = (2T + 1)”. (9.110) 
The value of H, which is called the Hurst parameter, affects the convergence behavior of the sam- 
ple mean. Note that if H = 1/2, the covariance function becomes Cy(k) = 1/2078, which corre- 
sponds to an iid sequence. In this case, the variance becomes o?/(2T + 1) which is the convergence 
rate of the sample mean for iid samples. However, for H > 1/2, the variance becomes: 

2T +1 
so the convergence of the sample mean is slower by a factor of (2T + 1)*“~! than for iid 
samples. 


The slower convergence of the sample mean when H > 1/2 results from the long-range de- 
pendence of X,,. It can be shown that for large k, the covariance function is approximately given by: 


Cy(k) = 0° H(2H — 1)kK-?, (9.112) 


For 1/2 < H < 1,C(k) decays as 1/k* where 0 < æ < 1, which is a very slow decay rate. Thus 
the dependence between values of X,, decreases slowly and the process is said to have a long 
memory or long-range dependence. 


VARI(X;)r] 


(27 +1771, (9.111) 


FOURIER SERIES AND KARHUNEN-LOEVE EXPANSION 


Let X(t) be a wide-sense stationary, mean square periodic random process with period 
T, that is, E[(X(t + T) — X(t))?] = 0. In order to simplify the development, we 
assume that X(t) is zero mean. We show that X(t) can be represented in a mean square 
sense by a Fourier series: 


XO) = X, Xp, (9.113) 


k=—00 


where the coefficients are random variables defined by 
T 
1 osia 
X; = T l X(t) PT qe. (9.114) 
0 


Equation (9.114) implies that, in general, the coefficients are complex-valued random 
variables. For complex-valued random variables, the correlation between two random 
variables X and Y is defined by E| XY*]. We also show that the coefficients are orthog- 
onal random variables, that is, E[ X, X% ] = 0 for k # m. 

Recall that if X(t) is mean square periodic, then Ry(r) is a periodic function in 7 
with period T. Therefore, it can be expanded in a Fourier series: 


Rx(t) = X ap, (9.115) 


k=—00 


where the coefficients a, are given by 


1 f7 ae 
ay = ah Ry(t’)e T dt’. (9.116) 
0 
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The coefficients a, appear in the following derivation. 


First, we show that the coefficients in Eq. (9.113) are orthogonal random vari- 
ables, that is, E[ X,X;,] = 0: 


l oa oh 
E[X,Xm] = |x f Xe err ar | 
0 


T 
1 ee 

= al E| X, Xe eer at’. 
T Jo 


The integrand of the above equation has 
1 f7 
E| X; X*(t)] = Atl X (uje 7T du xo 
0 


1 f7 l 
= [ Ry(u — here du 
T Jo 


1 T-t 
a Ry(v)e 7T dv bezpe 


-j2rktIT 
Ey 


= ae 


where we have used the fact that the Fourier coefficients can be calculated over any 
full period. Therefore 


T 
bi ol Site ha 
ELX Xm] = i ge PR PONE de 9 Bec (9.117) 


where ôx m is the Kronecker delta function. Thus X; and X,, are orthogonal random 


variables. Note that the above equation implies that a, = E l LX, l?], that is, the a, are 
real-valued. 


To show that the Fourier series equals X(t) in the mean square sense, we take 
l 


= E[|X(t)|7] z a| xo 5 Xero 


k=—00 


k=—00 


e| [xe = > Xe 


= PESO >. er + e| Ss Ss aXe | 
k=—00 k=—00 m=—0O 
= Ry(0) T: = aK 2 a; + > ák 


The above equation equals zero, since the a, are real and since Ry(0) = Ya, from Eq. 
(9.115). 

If X(t) is a wide-sense stationary random process that is not mean square periodic, 
we can still expand X(t) in the Fourier series in an arbitrary interval [0, T]. Mean square 
equality will hold only inside the interval. Outside the interval, the expansion repeats 


546 


9.9.1 


Chapter 9 Random Processes 


itself with period T. The Fourier coefficients will no longer be orthogonal; instead they 
are given by 


T T 
1 i . 
FLX XS T [ I Ry(t — uje 7T ePrmT dt du. (9.118) 
0 0 


It is easy to show that if X(t) is mean square periodic, then this equation reduces to Eq. 
(9.117). 


Karhunen-Loeve Expansion 


In this section we present the Karhunen-Loeve expansion, which allows us to expand a 
(possibly nonstationary) random process X(t) in a series: 


X(t) = S xal) 0=t=T, (9.119a) 


where 
T 
X; = i X(t) y(t) dt, (9.119b) 
0 


where the equality in Eq. (9.119a) is in the mean square sense, where the coefficients { X,} 
are orthogonal random variables, and where the functions {¢;,(t)} are orthonormal: 


T 
| hilt) b(t) dt = Ôi j for all 1, j. 
0 


In other words, the Karhunen-Loeve expansion provides us with many of the nice prop- 
erties of the Fourier series for the case where X(t) is not mean square periodic. For sim- 
plicity, we again assume that X(t) is zero mean. 

In order to motivate the Karhunen-Loeve expansion, we review the Karhunen- 
Loeve transform for vector random variables as introduced in Section 6.3. Let X be a 
zero-mean, vector random variable with covariance matrix Ky. The eigenvalues and 
eigenvectors of K y are obtained from 


Kye; = Aje;, (9.120) 


where the e; are column vectors. The set of normalized eigenvectors are orthonormal, 
that is, efe j = ô; j. Define the matrix P of eigenvectors and A of eigenvalues as 


P= [e], €2,..., €n] A= diag A;], 
then 


0 Aa œ 0 7 
Ky = PAP" = [e1, &2,..., €n] a i 
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ef 
T 
e 
= [Aye1, Ar@2,---, Anen] i 
e 
n 
= X eel. (9.121a) 
k=1 


Therefore we find that the covariance matrix can be expanded as a weighted sum of 
matrices, e;e} . In addition, if we let Y = PTX, then the random variables in Y are or- 
thogonal. Furthermore, since PP! = I, then 


Yı 
Y, n 
X = PY = [e,€,...,e,]} © | = X Yrer. (9.121b) 
k=1 
Y, 


Thus we see that the arbitrary vector random variable X can be expanded as a weighted 
sum of the eigenvectors of K y, where the coefficients are orthogonal random variables. 
Furthermore the eigenvectors form an orthonormal set. These are exactly the proper- 
ties we seek in the Karhunen-Loeve expansion for X(t). If the vector random variable 
X is jointly Gaussian, then the components of Y are independent random variables. 
This results in tremendous simplification in a wide variety of problems. 

In analogy to Eq. (9.120), we begin by considering the following eigenvalue equation: 


T 
[ K x(t, tr) bx(tr) dh = pkh) 054 ST. (9.122) 


The values A, and the corresponding functions (t) for which the above equation 
holds are called the eigenvalues and eigenfunctions of the covariance function 
Ky(t,, t). Note that it is possible for the eigenfunctions to be complex-valued, e.g., 
complex exponentials. It can be shown that if Ky(t,, t2) is continuous, then the nor- 
malized eigenfunctions form an orthonormal set and satisfy Mercer’s theorem: 


Kx(t1,t) = Di Aebu(ts)be(t). (9.123) 


Note the correspondence between Eq. (9.121) and Eq. (9.123). Equation (9.123) in 
turn implies that 


Kx(t,t) = Adea). (9.124) 


We are now ready to show that the equality in Eq. (9.119a) holds in the mean 
square sense and that the coefficients X, are orthogonal random variables. First con- 
sider E[ X; X}, |: 


K T 
E[ X; Xn] = al Xf X(C JPC) ar | = J ELX (t) X nlk) dt’. 
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The integrand of the above equation has 


E(X (t) Xa] =F x nf \bm(u jdu | = f Keenen (u) du 


= = ÀmPm(t) 


Therefore 5 
E[X Xn] =f ABOE) dt = AD 
0 


where 6, m is the Kronecker delta function. Thus X; and X„ are orthogonal random 


variables. Note that the above equation implies that A, = E l |x,17], that is, the eigen- 


values are real-valued. 
To show that the Karhunen-Loeve expansion equals X(t) in the mean square 


sense, we take 
a| [x ~ PELO i 
= E(|X(t)I7] - a| xo X xsi | 
= a| xro $ Xba | 
+e SS XX bO J 


= Rx(t,t) — ae Nglbe(t) |? 


ae Aklo)? + 2 Al&y(t) |? 


The above equation equals zero from Eq. (9.124) and from the fact that the A, are real. 
Thus we have shown that Eq. (9.119a) holds in the mean square sense. 

Finally, we note that in the important case where X(t) is a Gaussian random process, 
then the components X, will be independent Gaussian random variables. This result is ex- 
tremely useful in solving certain signal detection and estimation problems. [Van Trees. ] 


Example 9.51 Wiener Process 


Find the Karhunen-Loeve expansion for the Wiener process. 
Equation (9.122) for the Wiener process gives, for 0 < t, = T, 


T 
A(t) = Í o° min(t, )b(tr) dh 


ti 4 
= of bolh) dh + o f tilh) dh. 
0 ti 
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We differentiate the above integral equation once with respect to t; to obtain an integral equa- 
tion and again to obtain a differential equation: 


This second-order differential equation has a sinusoidal solution: 


1) = asin —= cos — =. 

Va Va 
In order to solve the above equation for a, b, and A, we need boundary conditions for the 
differential equation. We obtain these by substituting the general solution for (t) into the inte- 


gral equation: 
a ( qe b a) f. olh) dh + f olt) dt 
asin t Cos m 2 2 2 1 2 Ds 
o’ VA VA 0 ti 


As t; approaches zero, the right-hand side approaches zero. This implies that b = 0 in the left- 
hand side of the equation. A second boundary condition is obtained by letting t approach T in 
the equation obtained after the first differentiation of the integral equation: 


d oa oT 
0 = A—4¢(T) = cos : 
dt, Va VA 


This implies that 
T 1 
oT =(n-3)n n=1,2,.... 


Va 


Therefore the eigenvalues are given by 


which implies that a = (2/T)"”. Thus the eigenfunctions are given by 


2 1 
p(t) = zsin(n z 0stsT, 


and the Karhunen-Loeve expansion for the Wiener process is 


z 2 1 
X(t) = EAE sin n - Dz 0st<T, 


where the X,, are zero-mean, independent Gaussian random variables with variance given by A,,. 
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Example 9.52 White Gaussian Noise Process 


Find the Karhunen-Loeve expansion of the white Gaussian noise process. 
The white Gaussian noise process is the derivative of the Wiener process. If we take the 
derivative of the Karhunen-Loeve expansion of the Wiener process, we obtain 


TEE tO 2 1\7 
X'(t) SFX pom r)a 


= 2 1\r 
= W,,/= Se: Ost<T 
Ae = cos(n E ? 


where the W, are independent Gaussian random variables with the same variance ø°. This im- 
plies that the process has infinite power, a fact we had already found about the white Gaussian 
noise process. In the Problems we will see that any orthonormal set of eigenfunctions can be 
used in the Karhunen-Loeve expansion for white Gaussian noise. 


GENERATING RANDOM PROCESSES 


Many engineering systems involve random processes that interact in complex ways. It 
is not always possible to model these systems precisely using analytical methods. In 
such situations computer simulation methods are used to investigate the system dy- 
namics and to measure the performance parameters of interest. In this section we con- 
sider two basic methods to generating random processes. The first approach involves 
generating the sum process of iid sequences of random variables. We saw that this ap- 
proach can be used to generate the binomial and random walk processes, and, through 
limiting procedures, the Wiener and Poisson processes. The second approach involves 
taking the linear combination of deterministic functions of time where the coefficients 
are given by random variables. The Fourier series and Karhunen-Loeve expansion use 
this approach. Real systems, e.g., digital modulation systems, also generate random 
processes in this manner. 


Generating Sum Random Processes 


The generation of sample functions of the sum random process involves two steps: 


1. Generate a sequence of iid random variables that drive the sum process. 
2. Generate the cumulative sum of the iid sequence. 


Let D be an array of samples of the desired iid random variables. The function 
cumsum(D) in Octave and MATLAB then provides the cumulative sum, that is, the sum 
process, that results from the sequence in D. 

The code below generates m realizations of an n-step random walk process. 

>p=1/2 
>n=1000 
>m=4 
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>V=-1:2:1; 

> P=[1-p,p]; 

> D=discrete_rnd(V, P, m, n); 
> X=cumsum (D); 

> plot (X) 


Figures 9.7(a) and 9.7(b) in Section 9.3 show four sample functions of the symmetric ran- 
dom walk process for p = 1/2. The sample functions vary over a wide range of positive 
and negative values. Figure 9.7(c) shows four sample functions for p = 3/4. The sample 
functions now have a strong linear trend consistent with the mean n(2p — 1). The vari- 
ability about this trend is somewhat less than in the symmetric case since the variance 
function is now n4p(1 — p) = 3n/4. 

We can generate an approximation to a Poisson process by summing iid 
Bernoulli random variables. Figure 9.18(a) shows ten realizations of Poisson processes 
with A = 0.4 arrivals per second. The sample functions for T = 50 seconds were gen- 
erated using a 1000-step binomial process with p = AT/n = 0.02. The linear increas- 
ing trend of the Poisson process is evident in the figure. Figure 9.18(b) shows the 
estimate of the mean and variance functions obtained by averaging across the 10 real- 
izations. The linear trend in the sample mean function is very clear; the sample vari- 
ance function is also linear but is much more variable. The mean and variance 
functions of the realizations are obtained using the commands mean (transpose (X) ) 
and var (transpose (X) ). 

We can generate sample functions of the random telegraph signal by taking the 
Poisson process M(t) and calculating X(t) = 2(N(t) modulo 2) — 1. Figure 9.19(a) 
shows a realization of the random telegraph signal. Figure 9.19(b) shows an estimate of 
the covariance function of the random telegraph signal. The exponential decay in the 
covariance function can be seen in the figure. See Eq. (9.44). 
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(a) (b) 


FIGURE 9.18 
(a) Ten sample functions of a Poisson random process with A = 0.4. (b) Sample mean and variance of ten sample 
functions of a Poisson random process with A = 0.4. 
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FIGURE 9.19 
(a) Sample function of a random telegraph process with A = 0.4. (b) Estimate of covariance function of a random 
telegraph process. 


The covariance function is computed using the function cx_est below. 


function [CXall]=CX_est (X, L, M est) 


N=length (X); % N is number of samples 
CX=zeros (1,L+1); % L is maximum lag 
M_est=mean (X) % Sample mean 

for m=1:L+1, % Add product terms 


for n=1:N-m+1, 
CX(m)=CX(m) + (X(n) -M_est) * (X(n+m-1)- M est); 


end; 
CX (m)=CX(m) / (N-m+1) ; % Normalize by number of terms 
end; 
for 1=1¢L, 
CXall (1) =CX(L+2-1); % Lags 1 to L 
end 
CXall (L+1:2*L+1)=CX(1:L+1); %Lags L + 1to2L +1 


The Wiener random process can also be generated as a sum process. One ap- 
proach is to generate a properly scaled random walk process, as in Eq. (9.50). A better 
approach is to note that the Wiener process has independent Gaussian increments, as 
in Eq. (9.52), and therefore, to generate the sequence D of increments for the time 
subintervals, and to then find the corresponding sum process. The code below gener- 
ates a sample of the Wiener process: 

>a=2 

> delta=0.001 

>n=1000 

> D=normal_rnd(0,a*delta,1,n) 
> X=cumsum(D) ; 

> plot (X) 


9.10.2 
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FIGURE 9.20 
Sample mean and variance functions from 50 realizations of 
Wiener process. 


Figure 9.12 in Section 9.5 shows four sample functions of a Brownian motion process 
with a = 2. Figure 9.20 shows the sample mean and sample variance of 50 sample 
functions of the Wiener process with a = 2. It can be seen that the mean across the 50 
realizations is close to zero which is the actual mean function for the process. The sam- 
ple variance across the 50 realizations increases steadily and is close to the actual vari- 
ance function which is at = 2t. 


Generating Linear Combinations of Deterministic Functions 


In some situations a random process can be represented as a linear combination 
of deterministic functions where the coefficients are random variables. The Fouri- 
er series and the Karhunen-Loeve expansions are examples of this type of repre- 
sentation. 

In Example 9.51 let the parameters in the Karhunen-Loeve expansion for a 
Wiener process in the interval 0 = t = T be T = 1,0° = 1: 


= 2 1\ at = 1 
X(t) = X, i = X,V2 Si = = |at 
(t) >> mir sin( n 5) T 2 „V2 sin( n )m 


where the X,, are zero-mean, independent Gaussian random variables with variance 


à oT? 1 
” (n= 1R2Pr? (n -12r 


The following code generates the 100 Gaussian coefficients for the Karhunen-Loeve 
expansion for the Wiener process. 
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FIGURE 9.21 
Sample functions for Wiener process using 100 terms in Karhunen- 
Loeve expansion. 


> M=zeros (100,1); 
> n=1:1:100; % Number of coefficients 
> N=transpose (n); 


> v=1l./((N-0.5) .*2 *pi *2); % Variances of coefficients 
> t=0.01:0.01:1; 

> p=(N-0.5) *t; % Argument of sinusoid 

> x=normal_rnd(M,v,100,1); % Gaussian coefficients 

> y=sqrt (2)*sin(pi *p); % sin terms 


> z=transpose (x) *y 
> plot(z) 


Figure 9.21 shows the Karhunen-Loeve expansion for the Wiener process using 100 
terms. The sample functions generally exhibit the same type behavior as in the previous 
figures. The sample functions, however, do not exhibit the jaggedness of the other ex- 
amples, which are based on the generation of many more random variables. 


SUMMARY 


e Arandom process or stochastic process is an indexed family of random variables 
that is specified by the set of joint distributions of any number and choice of ran- 
dom variables in the family. The mean, autocovariance, and autocorrelation func- 
tions summarize some of the information contained in the joint distributions of 
pairs of time samples. 


e The sum process of an iid sequence has the property of stationary and indepen- 
dent increments, which facilitates the evaluation of the joint pdf/pmf of the 
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process at any set of time instants. The binomial and random processes are sum 
processes. The Poisson and Wiener processes are obtained as limiting forms of 
these sum processes. 


The Poisson process has independent, stationary increments that are Poisson dis- 
tributed. The interarrival times in a Poisson process are iid exponential random 
variables. 


The mean and covariance functions completely specify all joint distributions of a 
Gaussian random process. 


The Wiener process has independent, stationary increments that are Gaussian 
distributed. The Wiener process is a Gaussian random process. 


A random process is stationary if its joint distributions are independent of the 
choice of time origin. If a random process is stationary, then m y(t) is constant, 
and Ry(t,, t) depends only on f — b. 

A random process is wide-sense stationary (WSS) if its mean is constant and if its 
autocorrelation and autocovariance depend only on t — tf). A WSS process need 
not be stationary. 


A wide-sense stationary Gaussian random process is also stationary. 

A random process is cyclostationary if its joint distributions are invariant with re- 
spect to shifts of the time origin by integer multiples of some period T. 

The white Gaussian noise process results from taking the derivative of the 
Wiener process. 


The derivative and integral of a random process are defined as limits of random 
variables. We investigated the existence of these limits in the mean square sense. 


The mean and autocorrelation functions of the output of systems described by a 
linear differential equation and subject to random process inputs can be obtained 
by solving a set of differential equations. If the input process is a Gaussian ran- 
dom process, then the output process is also Gaussian. 

Ergodic theorems state when time-average estimates of a parameter of a random 
process converge to the expected value of the parameter. The decay rate of the 
covariance function determines the convergence rate of the sample mean. 


CHECKLIST OF IMPORTANT TERMS 


Autocorrelation function 
Autocovariance function 
Average power 

Bernoulli random process 
Binomial counting process 
Continuous-time process 
Cross-correlation function 
Cross-covariance function 
Cyclostationary random process 
Discrete-time process 


Ergodic theorem 

Fourier series 

Gaussian random process 
Hurst parameter 

iid random process 
Independent increments 
Independent random processes 
Karhunen-Loeve expansion 
Markov random process 

Mean ergodic random process 
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Mean function 

Mean square continuity 

Mean square derivative 

Mean square integral 

Mean square periodic process 

Ornstein-Uhlenbeck process 

Orthogonal random processes 

Poisson process 

Random process 

Random telegraph signal 

Random walk process 

Realization, sample path, or sample 
function 


ANNOTATED REFERENCES 


Shot noise 

Stationary increments 
Stationary random process 
Stochastic process 

Sum random process 

Time average 

Uncorrelated random processes 
Variance of X(t) 

White Gaussian noise 
Wide-sense cyclostationary process 
Wiener process 

WSS random process 


References [1] through [6] can be consulted for further reading on random processes. 
Larson and Shubert [ref 7] and Yaglom [ref 8] contain excellent discussions on white 
Gaussian noise and Brownian motion. Van Trees [ref 9] gives detailed examples on the 
application of the Karhunen-Loeve expansion. Beran [ref 10] discusses long memory 
processes. 
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Sections 9.1 and 9.2: Definition and Specification of a Stochastic Process 


9.1. 
9.2. 


9.3. 


9.4. 


9.5. 


9.6. 


In Example 9.1, find the joint pmf for X, and X2. Why are X, and X, independent? 


A discrete-time random process X, is defined as follows. A fair die is tossed and the out- 
come k is observed. The process is then given by X,, = k for all n. 


(a) Sketch some sample paths of the process. 

(b) Find the pmf for X,,. 

(c) Find the joint pmf for X,, and X „+x. 

(d) Find the mean and autocovariance functions of X „. 


A discrete-time random process X, is defined as follows. A fair coin is tossed. If the out- 
come is heads, X,, = (—1)" for all n; if the outcome is tails, X,, = (—1)"*? for all n. 


(a) Sketch some sample paths of the process. 

(b) Find the pmf for X,,. 

(c) Find the joint pmf for X,, and X „+x. 

(d) Find the mean and autocovariance functions of X „. 


A discrete-time random process is defined by X,, = s”, for n = 0, where s is selected at 
random from the interval (0, 1). 


(a) Sketch some sample paths of the process. 

(b) Find the cdf of X,. 

(c) Find the joint cdf for X,, and X,,+,. 

(d) Find the mean and autocovariance functions of X „. 

(e) Repeat parts a, b,c, and d if s is uniform in (1, 2). 

Let g(t) be the rectangular pulse shown in Fig. P9.1. The random process X(t) is defined as 


X(t) = Ag(t), 


where A assumes the values +1 with equal probability. 


>t 


0 1 
FIGURE P9.1 
(a) Find the pmf of X(t). 
(b) Find myx(t). 
(c) Find the joint pmf of X(t) and X(t + d). 
(d) FindCy(t,t + d),d > 0. 
A random process is defined by 


Y(t) = g(t - T), 


where g(t) is the rectangular pulse of Fig. P9.1, and T is a uniformly distributed random 
variable in the interval (0, 1). 
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9.7. 


9.8. 


9.9. 


9.10. 
9.11. 


9.12. 


9.13. 


Random Processes 


(a) Find the pmf of Y(0). 
(b) Find my(t) and Cy (ty à t2). 
A random process is defined by 


X(t) = g(t - T), 


where T is a uniform random variable in the interval (0, 1) and g(f) is the periodic trian- 
gular waveform shown in Fig. P9.2. 


» í 


0 1 2 3 


FIGURE P9.2 


(a) Find the cdf of X(t) for0 < t < 1. 

(b) Find mx(t) and Cy(t,, t2). 

Let Y(t) = g(t — T) as in Problem 9.6, but let T be an exponentially distributed random 
variable with parameter a. 

(a) Find the pmf of Y(t). 

(b) Find the joint pmf of Y(t) and Y(t + d). Consider two cases:d > 1,and0 < d < 1. 
(c) Find my(t) and Cy(t,t + d) ford > 1and0<d < 1. 

Let Z(t) = At? + B, where A and B are independent random variables. 

(a) Find the pdf of Z(¢). 

(b) Find mz(t) and Cz(ty, t2). 

Find an expression for E[| X, — X, |?] in terms of autocorrelation function. 

The random process H(t) is defined as the “hard-limited” version of X(t): 


+1 if X(t)=0 
H(t) = 
() E if X(t) <0. 
(a) Find the pdf, mean, and autocovariance of H(t) if X(t) is the sinusoid with a random 
amplitude presented in Example 9.2. 


(b) Find the pdf, mean, and autocovariance of H(t) if X(t) is the sinusoid with random 
phase presented in Example 9.9. 


(c) Find a general expression for the mean of H(t) in terms of the cdf of X(t). 
(a) Are independent random processes orthogonal? Explain. 

(b) Are orthogonal random processes uncorrelated? Explain. 

(c) Are uncorrelated processes independent? 

(d) Are uncorrelated processes orthogonal? 

The random process Z(t) is defined by 


Z(t) =2Xt—Y, 


9.14. 


9.15. 


9.16. 


9.17. 


9.18. 


9.19. 


9.20. 
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where X and Y are a pair of random variables with means my, my, variances 0%, 0%, 
and correlation coefficient py y. Find the mean and autocovariance of Z(t). 


Let H(t) be the output of the hard limiter in Problem 9.11. 


(a) Find the cross-correlation and cross-covariance between H(t) and X(t) when the 
input is a sinusoid with random amplitude as in Problem 9.1 1a. 


(b) Repeat if the input is a sinusoid with random phase as in Problem 9.11b. 
(c) Are the input and output processes uncorrelated? Orthogonal? 


Let Y, = X, + g(n) where X, is a zero-mean discrete-time random process and g(n) is 
a deterministic function of n. 


(a) Find the mean and variance of Y „. 

(b) Find the joint cdf of Y,, and Y „+1. 

(c) Find the autocovariance function of Y „. 

(d) Plot typical sample functions forX,, and Y, if: g(n) = n; g(n) = 1/n?; g(n) = In. 

Let Y,, = c(n)X,, where X, is a zero-mean, unit-variance, discrete-time random process 

and c(n) is a deterministic function of n. 

(a) Find the mean and variance of Y,. 

(b) Find the joint cdf of Y,, and Y „41. 

(c) Find the autocovariance function of Y „. 

(d) Plot typical sample functions forX,, and Y,, if:c(n) = n; e(n) = In’; e(n) = In. 

(a) Find the cross-correlation and cross-covariance for X,, and Y,, in Problem 9.15. 

(b) Find the joint pdf of X„ and Y,,41. 

(c) Determine whether X,, and Y,, are uncorrelated, independent, or orthogonal ran- 
dom processes. 

(a) Find the cross-correlation and cross-covariance for X,, and Y,, in Problem 9.16. 

(b) Find the joint pdf of X,, and Y,,4, 

(c) Determine whether X,, and Y,, are uncorrelated, independent, or orthogonal ran- 
dom processes. 

Suppose that X(t) and Y(t) are independent random processes and let 


X(t) — Y(t) 
X(t) + Y(t). 


(a) Find Cy x(t, t2), Cuy(t, t2), and Cyy(t1, t2). 

(b) Find the fui) x(1,)(u, x), and fuq )v (u, v). Hint: Use auxiliary variables. 

Repeat Problem 9.19 if X(t) and Y(t) are independent discrete-time processes and X(t) 
and Y(t) have different iid random processes. 


Section 9.3: Sum Process, Binomial Counting Process, and Random Walk 


9.21. 


9.22. 


(a) Let Y,, be the process that results when individual 1’s in a Bernoulli process are 
erased with probability a. Find the pmf of S,, the counting process for Y,,. Does Y, 
have independent and stationary increments? 

(b) Repeat part a if in addition to the erasures, individual 0’s in the Bernoulli process 
are changed to 1’s with probability B. 

Let S„ denote a binomial counting process. 
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9.23. 


9.24. 


9.25. 


9.26. 


9.27. 


9.28. 


9.29. 


(a) Show that P[S, = j, Sw = i] # P[S, = j|P[S, = i]. 

(b) Find P[S,, = j| Sn, = i], where m > ny. 

(c) Show that P[S,, = j|Sn, = i, Sn, = k] = P[Sp, = jl Sn, = i], where m > ny > no. 
(a) Find P[S, = 0] for the random walk process. 

(b) What is the answer in part aif p = 1/2? 

Consider the following moving average processes: 


Y, = 1/2(X, + X)-1) Xo = 0 
Zn = 2/3X, + 13X, X= 0 
(a) Find the mean, variance, and covariance of Y,, and Z, if X,, is a Bernoulli random 
process. 
(b) Repeat part a if X,, is the random step process. 


(c) Generate 100 outcomes of a Bernoulli random process X,,, and find the resulting Y „ 
and Z,,. Are the sample means of Y,, and Z,, in part a close to their respective 
means? 


(d) Repeat part c with X,, given by the random step process. 
Consider the following autoregressive processes: 


W, =2W,1+X, W=0 
Zn = 314Z, + X, Zo = 0. 


(a) Suppose that X, is a Bernoulli process. What trends do the processes exhibit? 


(b) Express W,, and Z, in terms of X,, X,-1,.-., X1 and then find E[W,,] and E[Z,]. 
Do these results agree with the trends you expect? 


(c) Do W,, or Z, have independent increments? stationary increments? 


(d) Generate 100 outcomes of a Bernoulli process. Find the resulting realizations of W,, 
and Z,,. Is the sample mean meaningful for either of these processes? 


(e) Repeat part d if X,, is the random step process. 
Let M,, be the discrete-time process defined as the sequence of sample means of an iid 
sequence: 

Xit Xp t+ Xn 


n 


M, 


(a) Find the mean, variance, and covariance of M,,. 
(b) Does M, have independent increments? stationary increments? 


Find the pdf of the processes defined in Problem 9.24 if the X,, are an iid sequence of 
zero-mean, unit-variance Gaussian random variables. 


Let X,, consist of an iid sequence of Cauchy random variables. 

(a) Find the pdf of the sum process S,,. Hint: Use the characteristic function method. 
(b) Find the joint pdf of S,, and S,,4,. 

Let X,„ consist of an iid sequence of Poisson random variables with mean a. 

(a) Find the pmf of the sum process S,,. 

(b) Find the joint pmf of S,, and S,,4,. 


9.30. 


9.31. 


9.32. 


9.33. 
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Let X, be an iid sequence of zero-mean, unit-variance Gaussian random variables. 

(a) Find the pdf of M,, defined in Problem 9.26. 

(b) Find the joint pdf of M,, and M,,,,. Hint: Use the independent increments property 
of S,- 

Repeat Problem 9.26 with X,, = 1/2(Y, + Y,-1), where Y,, is an iid random process. 

What happens to the variance of M,, as n increases? 

Repeat Problem 9.26 with X,, = 3/4X,_, + Y, where Y, is an iid random process. What 

happens to the variance of M,, as n increases? 


Suppose that an experiment has three possible outcomes, say 0, 1, and 2, and suppose that 
these occur with probabilities pọ, pı, and pz, respectively. Consider a sequence of inde- 
pendent repetitions of the experiment, and let X;(n) be the indicator function for out- 
come j. The vector 

X(n) = (Xo(n), Xi(n), X2(n)) 


then constitutes a vector-valued Bernoulli random process. Consider the counting 
process for X(n): 


(a) Show that S(n) has a multinomial distribution. 

(b) Show that S(n) has independent increments, then find the joint pmf of S(m) and 
S(n + k). 

(c) Show that components S;(n) of the vector process are binomial counting 
processes. 


Section 9.4: Poisson and Associated Random Processes 


9.34. 


9.35. 


9.36. 


9.37. 


A server handles queries that arrive according to a Poisson process with a rate of 10 
queries per minute. What is the probability that no queries go unanswered if the server is 
unavailable for 20 seconds? 


Customers deposit $1 in a vending machine according to a Poisson process with rate A. 
The machine issues an item with probability p. Find the pmf for the number of items dis- 
pensed in time t. 

Noise impulses occur in a radio transmission according to a Poisson process of rate A. 


(a) Find the probability that no impulses occur during the transmission of a message 
that is ¢ seconds long. 


(b) Suppose that the message is encoded so that the errors caused by up to 2 impulses can 
be corrected. What is the probability that a t-second message cannot be corrected? 


Packets arrive at a multiplexer at two ports according to independent Poisson processes 
of rates A, = 1 and Az = 2 packets/second, respectively. 


(a) Find the probability that a message arrives first on line 2. 
(b) Find the pdf for the time until a message arrives on either line. 


(c) Find the pmf for N(t), the total number of messages that arrive in an interval of 
length t. 


(d) Generalize the result of part c for the “merging” of k independent Poisson processes 
of rates A),..., Ax, respectively: 


N(t) = N,(t) + + N,(t). 
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9.38. 


9.39. 


9.40. 


9.41. 


9.42. 


9.43. 


Random Processes 


(a) Find P[N(t — d) = j| N(t) = k] with d > 0, where N(f) is a Poisson process with 
rate À. 

(b) Compare your answer to P[N(t + d) = j| N(t) = k]. Explain the difference, if 
any. 


Let N,(t) be a Poisson process with arrival rate A, that is started at t = 0. Let N2(t) be 
another Poisson process that is independent of N,(t), that has arrival rate Az, and that is 
started att = 1. 


(a) Show that the pmf of the process N(t) = N,(t) + N2(t) is given by: 


P[N(t + T) — M(t) =k] = Beene)” for ka= 0,1,... 


where m(t) = E[ N(t)]. 
(b) Now consider a Poisson process in which the arrival rate A(t) is a piecewise constant 


function of time. Explain why the pmf of the process is given by the above pmf 
where 


m(t) = [roa 


(c) For what other arrival functions A(t) does the pmf in part a hold? 

(a) Suppose that the time required to service a customer in a queueing system is a ran- 
dom variable T. If customers arrive at the system according to a Poisson process 
with parameter A, find the pmf for the number of customers that arrive during one 
customer’s service time. Hint: Condition on the service time. 

(b) Evaluate the pmf in part a if T is an exponential random variable with parameter £. 


(a) Is the difference of two independent Poisson random processes also a Poisson 
process? 


(b) Let N,(t) be the number of complete pairs generated by a Poisson process up to 
time t. Explain why N,(t) is or is not a Poisson process. 

Let N(t) be a Poisson random process with parameter A. Suppose that each time an event 

occurs, a coin is flipped and the outcome (heads or tails) is recorded. Let N;(t) and N2(t) 

denote the number of heads and tails recorded up to time t, respectively. Assume that p is 

the probability of heads. 

(a) Find P[N(t) = j, N2(t) = k| N(t) =k + j]. 

(b) Use part a to show that N;(t) and N2(t) are independent Poisson random variables 
of rates pAt and (1 — p)At, respectively: 


= k 
P[N,(t) = j, No(t) = k] = (PADE ow (Cl p)at) 


—(1—p)at 
e 
j! k! 


Customers play a $1 game machine according to a Poisson process with rate A. Suppose 
the machine dispenses a random reward X each time it is played. Let X(t) be the total 
reward issued up to time t. 

(a) Find expressions forP[ X(t) = j] if X, is Bernoulli. 

(b) Repeat part a if X assumes the values {0, 5} with probabilities (5/6, 1/6). 
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(c) Repeat part aif X is Poisson with mean 1. 
(d) Repeat part a if with probability p the machine returns all the coins. 

9.44. Let X(t) denote the random telegraph signal, and let Y(f) be a process derived from X(t) 
as follows: Each time X(t) changes polarity, Y(t) changes polarity with probability p. 
(a) Find the P[Y(t) = +1]. 
(b) Find the autocovariance function of Y(t). Compare it to that of X(t). 

9.45. Let Y(t) be the random signal obtained by switching between the values 0 and 1 accord- 
ing to the events in a Poisson process of rate A. Compare the pmf and autocovariance of 
Y(t) with that of the random telegraph signal. 


9.46. Let Z(t) be the random signal obtained by switching between the values 0 and 1 accord- 
ing to the events in a counting process M(t). Let 


1 Ae 
P[N(t) = k] (4) k = 0,1,2,.... 
(a) Find the pmf of Z(®). 
(b) Find mz(t). 
9.47. In the filtered Poisson process (Eq. (9.45)), let h(t) be a pulse of unit amplitude and dura- 
tion T seconds. 
(a) Show that X(t) is then the increment in the Poisson process in the interval (t — T, t). 
(b) Find the mean and autocorrelation functions of X(t). 


9.48. (a) Find the second moment and variance of the shot noise process discussed in 
Example 9.25. 


(b) Find the variance of the shot noise process if h(t) = e *' for t = 0. 


9.49. Messages arrive at a message center according to a Poisson process of rate A. Every 
hour the messages that have arrived during the previous hour are forwarded to their 
destination. Find the mean of the total time waited by all the messages that arrive 
during the hour. Hint: Condition on the number of arrivals and consider the arrival 
instants. 


Section 9.5: Gaussian Random Process, Wiener Process and Brownian Motion 


9.50. Let X(t) and Y(t) be jointly Gaussian random processes. Explain the relation be- 
tween the conditions of independence, uncorrelatedness, and orthogonality of X(t) 
and Y(t). 

9.51. Let X(t) be a zero-mean Gaussian random process with autocovariance function given by 


Cx(th, t2) = 4e 2-4, 


Find the joint pdf of X(t) and X(t + s). 
9.52. Find the pdf of Z(t) in Problem 9.13 if X and Y are jointly Gaussian random variables. 
9.53. Let Y(t) = X(t + d) — X(t), where X(t) is a Gaussian random process. 

(a) Find the mean and autocovariance of Y(t). 

(b) Find the pdf of Y(t). 

(c) Find the joint pdf of Y(t) and Y(t + s). 

(d) Show that Y(t) is a Gaussian random process. 
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Let X(t) = Acos wt + Bsin wt, where A and B are iid Gaussian random variables with 


zero mean and variance o°. 


(a) Find the mean and autocovariance of X(t). 

(b) Find the joint pdf of X(t) and X(t + s). 

Let X(t) and Y(t) be independent Gaussian random processes with zero means and the 
same covariance function C(t, t2). Define the “amplitude-modulated signal” by 


Z(t) = X(t) cos wt + Y(t) sin ot. 


(a) Find the mean and autocovariance of Z(t). 

(b) Find the pdf of Z(t). 

Let X(t) be a zero-mean Gaussian random process with autocovariance function given by 
Cyx(t1, t2). If X(t) is the input to a “square law detector,” then the output is 


Find the mean and autocovariance of the output Y(t). 

Let Y(t) = X(t) + ut, where X(t) is the Wiener process. 

(a) Find the pdf of Y(¢). 

(b) Find the joint pdf of Y(t) and Y(t + s). 

Let Y(t) = X7(t), where X(t) is the Wiener process. 

(a) Find the pdf of Y(t). 

(b) Find the conditional pdf of Y (t2) given Y(t,). 

Let Z(t) = X(t) — aX(t — s), where X(t) is the Wiener process. 

(a) Find the pdf of Z(¢). 

(b) Find mz(t) and C7(ty, t2). 

(a) For X(t) the Wiener process with a = 1 and 0 < ¢ < 1, show that the joint pdf of 
X(t) and X(1) is given by: 


(b) Use part a to show that for O < t < 1, the conditional pdf of X(t) given 


X(0) = X(1) = Ois: 
1 x? 
PL aL = 1) 


2rVt(1 — t) 


(c) Use part b to find the conditional pdf of X(t) given X(t) = a and X(t.) = b for 
ty < t < h. Hint: Find the equivalent process in the interval (0, f2 — t4). 
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Section 9.6: Stationary Random Processes 


9.61. 


9.62. 


9.63. 


9.64. 


9.65. 


9.66. 


9.67. 


9.68. 


(a) Is the random amplitude sinusoid in Example 9.9 a stationary random process? Is it 
wide-sense stationary? 


(b) Repeat part a for the random phase sinusoid in Example 9.10. 


A discrete-time random process X, is defined as follows. A fair coin is tossed; if the out- 
come is heads then X,, = 1 for all n, and X„ = —1 for all n, otherwise. 


(a) Is X,,a WSS random process? 

(b) Is X, a stationary random process? 

(c) Do the answers in parts a and b change if p is a biased coin? 
Let X,, be the random process in Problem 9.3. 

(a) Is X,,a WSS random process? 

(b) Is X, a stationary random process? 

(c) Is X,,acyclostationary random process? 


Let X(t) = g(t — T), where g(t) is the periodic waveform introduced in Problem 9.7, 
and T is a uniformly distributed random variable in the interval (0, 1). Is X(t) a stationary 
random process? Is X(t) wide-sense stationary? 


Let X(t) be defined by 
X(t) = Acos@t + Bsin at, 


where A and B are iid random variables. 

(a) Under what conditions is X(t) wide-sense stationary? 

(b) Show that X(t) is not stationary. Hint: Consider E[ X3(t)]. 
Consider the following moving average process: 


Y, = (Xn + X,-1) X%=0. 


(a) Is Y,,astationary random process if X, is an iid integer-valued process? 
(b) Is Y,,astationary random process if X „is a stationary process? 


(c) Are Y,, and X, jointly stationary random processes if X,„ is an iid process? a sta- 
tionary process? 


Let X,„ be a zero-mean iid process, and let Z,, be an autoregressive random process 
Zn = 3/4Z,-1 + Xn Zo = 0. 


(a) Find the autocovariance of Z, and determine whether Z, is wide-sense stationary. 
Hint: Express Z, in terms of X;,, Xy-1,..-, X1. 
(b) Does Z,, eventually settle down into stationary behavior? 


(c) Find the pdf of Z, if X, is an iid sequence of zero-mean, unit-variance Gaussian ran- 
dom variables. What is the pdf of Z,, as n —> œ? 


Let Y(t) = X(t + s) — BX(t), where X(t) is a wide-sense stationary random process. 
(a) Determine whether Y(t) is also a wide-sense stationary random process. 


(b) Find the cross-covariance function of Y(t) and X(t). Are the processes jointly wide- 
sense stationary? 
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(©) Find the pdf of Y(t) if X(t) is a Gaussian random process. 

(d) Find the joint pdf of Y(t,) and Y (t2) in part c. 

(e) Find the joint pdf of Y(t,) and X (t2) in part c. 

Let X(t) and Y(t) be independent, wide-sense stationary random processes with zero 
means and the same covariance function Cy(7). Let Z(t) be defined by 


Z(t) = 3X(t) — 5¥(t). 


(a) Determine whether Z(t) is also wide-sense stationary. 


(b) Determine the pdf of Z(t) if X(t) and Y(t) are also jointly Gaussian zero-mean ran- 
dom processes with Cy(7) = 4e Ht, 

(©) Find the joint pdf of Z(t,) and Z(t,) in part b. 

(d) Find the cross-covariance between Z(t) and X(t). Are Z(t) and X(t) jointly station- 
ary random processes? 

(e) Find the joint pdf of Z(t,) and X(t) in part b. Hint: Use auxilliary variables. 


Let X(t) and Y(t) be independent, wide-sense stationary random processes with zero 
means and the same covariance function Cy(r). Let Z(t) be defined by 


Z(t) = X(t) cos wt + Y(t) sin at. 


(a) Determine whether Z(t) is a wide-sense stationary random process. 


(b) Determine the pdf of Z(t) if X(t) and Y(t) are also jointly Gaussian zero-mean ran- 
dom processes with Cy(7) = 4e” M, 

(c) Find the joint pdf of Z(t,) and Z(t,) in part b. 

(d) Find the cross-covariance between Z(t) and X(t). Are Z(t) and X(t) jointly station- 
ary random processes? 

(e) Find the joint pdf of Z(t,;) and X (tz) in part b. 

Let X(t) be a zero-mean, wide-sense stationary Gaussian random process with autocorre- 

lation function Ry(7). The output of a “square law detector” is 


Show that Ry(7) = Ry(0)* + 2R%(7). Hint: For zero-mean, jointly Gaussian random 
variables E[ X?Z7] = E[X°]E[Z7] + 2E[ XZ}. 
A WSS process X(t) has mean 1 and autocorrelation function given in Fig. P9.3. 


i. 
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FIGURE P9.3 


(a) Find the mean component of Ry(rT). 
(b) Find the periodic component of Ry(7). 
(©) Find the remaining component of Ry(7). 
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Let X,, and Y,, be independent random processes. A multiplexer combines these two se- 
quences into a combined sequence U+, that is, 


Uan = Xn, Un+1 = Y p: 


(a) Suppose that X„ and Y, are independent Bernoulli random processes. Under 
what conditions is U, a stationary random process? a cyclostationary random 
process? 

(b) Repeat part aif X,, and Y, are independent stationary random processes. 

(c) Suppose that X,, and Y,, are wide-sense stationary random processes. Is U, a wide- 
sense stationary random process? a wide-sense cyclostationary random process? 
Find the mean and autocovariance functions of U,. 


(d) If U, is wide-sense cyclostationary, find the mean and correlation function of the 
randomly phase-shifted version of U; as defined by Eq. (9.72). 


A ternary information source produces an iid, equiprobable sequence of symbols from 
the alphabet {a, b, c}. Suppose that these three symbols are encoded into the respective 
binary codewords 00, 01, 10. Let B,, be the sequence of binary symbols that result from 
encoding the ternary symbols. 


(a) Find the joint pmf of B,, and B,,; for n even;n odd. Is B, stationary? cyclostationary? 


(b) Find the mean and covariance functions of B,,. Is B, wide-sense stationary? wide- 
sense cyclostationary? 


(c) If B,, is cyclostationary, find the joint pmf, mean, and autocorrelation functions of the 
randomly phase-shifted version of B,, as defined by Eq. (9.72). 


Let s(t) be a periodic square wave with period T = 1 which is equal to 1 for the first half 
of a period and —1 for the remainder of the period. Let X(t) = As(t), where A is a ran- 
dom variable. 


(a) Find the mean and autocovariance functions of X(¢). 

(b) Is X(t) a mean-square periodic process? 

(c) Find the mean and autocovariance of X,(t) the randomly phase-shifted version of 
X(t) given by Eq. (9.72). 

Let X(t) = As(t) and Y(t) = Bs(t), where A and B are independent random variables 


that assume values +1 or —1 with equal probabilities, where s(t) is the periodic square 
wave in Problem 9.75. 


(a) Find the joint pmf of X (tı) and Y (t2). 

(b) Find the cross-covariance of X(t) and Y(t). 

(c) Are X(t) and Y(t) jointly wide-sense cyclostationary? Jointly cyclostationary? 

Let X(t) be a mean square periodic random process. Is X(t) a wide-sense cyclostationary 
process? 

Is the pulse amplitude modulation random process in Example 9.38 cyclostationary? 

Let X(t) be the random amplitude sinusoid in Example 9.37. Find the mean and autocor- 
relation functions of the randomly phase-shifted version of X(t) given by Eq. (9.72). 
Complete the proof that if X(t) is a cyclostationary random process, then X,(t), defined 
by Eq. (9.72), is a stationary random process. 

Show that if X(t) is a wide-sense cyclostationary random process, then X,(t), defined by 
Eq. (9.72), is a wide-sense stationary random process with mean and autocorrelation 
functions given by Eqs. (9.74a) and (9.74b). 
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Section 9.7: Continuity, Derivatives, and Integrals of Random Processes 


9.82. 


9.83. 


9.84. 


9.85. 


9.86. 


9.87. 


9.88. 


9.89. 
9.90. 


Let the random process X(t) = u(t — S) be a unit step function delayed by an exponen- 
tial random variable S, that is, X(t) = 1 fort = S, and X(t) = Ofort < S. 


(a) Find the autocorrelation function of X(t). 

(b) Is X(t) mean square continuous? 

(c) Does X(t) have a mean square derivative? If so, find its mean and autocorrelation 
functions. 

(d) Does X(t) have a mean square integral? If so, find its mean and autocovariance 
functions. 

Let X(t) be the random telegraph signal introduced in Example 9.24. 

(a) Is X(t) mean square continuous? 


(b) Show that X(t) does not have a mean square derivative, and show that the second 
mixed partial derivative of its autocorrelation function has a delta function. What 
gives rise to this delta function? 


(c) Does X(t) have a mean square integral? If so, find its mean and autocovariance 
functions. 


Let X(t) have autocorrelation function 


(a) Is X(t) mean square continuous? 

(b) Does X(t) have a mean square derivative? If so, find its mean and autocorrelation 
functions. 

(c) Does X(t) have a mean square integral? If so, find its mean and autocorrelation 
functions. 

(d) Is X(t) a Gaussian random process? 

Let N(t) be the Poisson process. Find E[(N(t) — N(to))*] and use the result to show that 

N(t) is mean square continuous. 

Does the pulse amplitude modulation random process discussed in Example 9.38 have a 

mean square integral? If so, find its mean and autocovariance functions. 

Show that if X(t) is a mean square continuous random process, then X(t) has a mean 

square integral. Hint: Show that 


Rx(t1,t2) — Rx(to, to) = E[( X(t) — X(to))X(t2)] + ELX(to)(X(t2) — X(to))], 


and then apply the Schwarz inequality to the two terms on the right-hand side. 

Let Y(t) be the mean square integral of X(t) in the interval (0, £). Show that Y’(t) is equal 
to X(t) in the mean square sense. 

Let X(t) be a wide-sense stationary random process. Show that E[_X(t)X'(t)] = 0. 

A linear system with input Z(t) is described by 


X'(t) + aX(t)= Z(t) t= 0,X(0) =0. 


Find the output X(t) if the input is a zero-mean Gaussian random process with autocor- 
relation function given by 


Ry(7t) = oe Pr, 
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Section 9.8: Time Averages of Random Processes and Ergodic Theorems 


9.91. 
9.92. 


9.93. 


9.94. 


9.95. 


9.96. 


9.97. 


9.98. 


9.99. 


9.100. 


9.101. 


9.102. 


Find the variance of the time average given in Example 9.47. 
Are the following processes WSS and mean ergodic? 

(a) Discrete-time dice process in Problem 9.2. 

(b) Alternating sign process in Problem 9.3. 

(c) X, = s”, forn = 0 in Problem 9.4. 


Is the following WSS random process X(t) mean ergodic? 
0 |r| > 1 
R = 
x(7) 124 = Pay. inl St 


Let X(t) = Acos(2aft), where A is a random variable with mean m and variance o°. 


(a) Evaluate <X (t)>r, find its limit as T — co, and compare to my(t). 
(b) Evaluate <X(t + 7)X(t)>, find its limit as T — œ, and compare to Ry(t + 7, ft). 
Repeat Problem 9.94 with X(t) = A cos(27ft + ©), where A is as in Problem 9.94, © is 


a random variable uniformly distributed in (0,277), and A and © are independent ran- 
dom variables. 


Find an exact expression for VAR[ <X(t)>r] in Example 9.48. Find the limit as T > co. 


The WSS random process X,, has mean m and autocovariance Cy(k) = (1/2). Is X„ 
mean ergodic? 


(a) Are the moving average processes Y „in Problem 9.24 mean ergodic? 
(b) Are the autoregressive processes Z,, in Problem 9.25a mean ergodic? 
(a) Show that a WSS random process is mean ergodic if 


[ico < OO, 


(b) Show that a discrete-time WSS random process is mean ergodic if 


o0 


> IC(k)| < œ. 


k=—00 

Let <X?(t)>r denote a time-average estimate for the mean power of a WSS random 
process. 

(a) Under what conditions is this time average a valid estimate for E[ X?(t)]? 

(b) Apply your result in part a for the random phase sinusoid in Example 9.2. 


(a) Under what conditions is the time average < X(t + T)X(t)>r a valid estimate for 
the autocorrelation Ry(7) of a WSS random process X(t)? 


(b) Apply your result in part a for the random phase sinusoid in Example 9.2. 
Let Y(t) be the indicator function for the event {a < X(t) = b}, that is, 


ijl if X(t) € (a, b] 
yos t otherwise. 


(a) Show that <Y(t)>, is the proportion of time in the time interval (—T,T) that 
X(t)e(a,b]. 
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(b) Find E[<Y(t)>7]. 

(c) Under what conditions does <Y(t)>;— Pla < X(t) = b]? 
(d) How can <Y(t)> 7 be used to estimate P[ X(t) = x]? 

(e) Apply the result in part d to the random telegraph signal. 


(a) Repeat Problem 9.102 for the time average of the discrete-time Y „, which is defined 
as the indicator for the event {a < X, = b]}. 

(b) Apply your result in part a to an iid discrete-valued random process. 

(c) Apply your result in part a to an iid continuous-valued random process. 

Forn = 1, define Z,, = u(a — X,,), where u(x) is the unit step function, that is, X„ = 1 if 

and only if X,, = a. 

(a) Show that the time average <Z,,>,y is the proportion of X,,’s that are less than a in 
the first N samples. 

(b) Show that if the process is ergodic (in some sense), then this time average is equal to 
Fy(a) = P[X = a]. 

In Example 9.50 show that VAR[(X,,)7] = (A (2T + 1)”. 


Plot the covariance function vs. k for the self-similar process in Example 9.50 with o? = 1 
for: H = 0.5, H = 0.6, H = 0.75, H = 0.99. Does the long-range dependence of the 
process increase or decrease with H? 


(a) Plot the variance of the sample mean given by Eq. (9.110) vs. T with o? = 1 for: 
H = 0.5, H = 0.6, H = 0.75, H = 0.99. 

(b) For the parameters in part a, plot (27 + 1)°¥ ~! vs. T, which is the ratio of the vari- 
ance of the sample mean of a long-range dependent process relative to the variance 
of the sample mean of an iid process. How does the long-range dependence manifest 
itself, especially for H approaching 1? 


(c) Comment on the width of confidence intervals for estimates of the mean of long- 
range dependent processes relative to those of iid processes. 

Plot the variance of the sample mean for a long-range dependent process (Eq. 9.110) vs. 

the sample size T in a log-log plot. 

(a) What role does H play in the plot? 

(b) One of the remarkable indicators of long-range dependence in nature comes from a 
set of observations of the minimal water levels in the Nile river for the years 
622-1281 [Beran, p. 22] where the log-log plot for part a gives a slope of —0.27. What 
value of H corresponds to this slope? 

Problem 9.99b gives a sufficient condition for mean ergodicity for discrete-time random 

processes. Use the expression in Eq. (9.112) for a long-range dependent process to deter- 

mine whether the sufficient condition is satisfied. Comment on your findings. 


*Section 9.9: Fourier Series and Karhunen-Loeve Expansion 


9.110. 


Let X(t) = Xe’ where X is a random variable. 


(a) Find the correlation function for X(t), which for complex-valued random processes 
is defined by Rx(t1, t2) = E[X(t1)X*(t2)], where * denotes the complex conjugate. 
(b) Under what conditions is X(t) a wide-sense stationary random process? 


9.111. 


9.112. 


9.113. 


9.114. 


9.115. 


9.116. 


9.117. 
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Consider the sum of two complex exponentials with random coefficients: 
X(t) = X elo F Xel’ where w1 # w2. 


(a) Find the covariance function of X(¢). 

(b) Find conditions on the complex-valued random variables X}, and X, for X(t) to be 
a wide-sense stationary random process. 

(c) Show that if we let w; = —w,, X,; = (U — jV)/2 and X, = (U + jV)/2, where U 
and V are real-valued random variables, then X(t) is a real-valued random process. 
Find an expression for X(t) and for the autocorrelation function. 


(d) Restate the conditions on X; and X, from part b in terms of U and V. 


(e) Suppose that in part c, U and V are jointly Gaussian random variables. Show that 
X(t) is a Gaussian random process. 


(a) Derive Eq. (9.118) for the correlation of the Fourier coefficients for a non-mean 
square periodic process X(t). 

(b) Show that Eq. (9.118) reduces to Eq. (9.117) when X(f) is WSS and mean square periodic. 

Let X(t) be a WSS Gaussian random process with Ry(7) = e™. 

(a) Find the Fourier series expansion for X(t) in the interval [0, T]. 

(b) What is the distribution of the coefficients in the Fourier series? 

Show that the Karhunen-Loeve expansion of a WSS mean-square periodic process X(t) 

yields its Fourier series. Specify the orthonormal set of eigenfunctions and the corre- 

sponding eigenvalues. 

Let X(t) be the white Gaussian noise process introduced in Example 9.43. Show that any 

set of orthonormal functions can be used as the eigenfunctions for X(t) in its Karhunen- 

Loeve expansion. What are the eigenvalues? 

Let Y(t) = X(t) + W(t), where X(t) and W(t) are orthogonal random processes and 

W(t) is a white Gaussian noise process. Let ¢,,(t) be the eigenfunctions corresponding to 

K y(t, t2). Show that ¢,(t) are also the eigenfunctions for Ky(t;, t2). What is the relation 

between the eigenvalues of K y(t,, t2) and those of Ky(t,, t2)? 

Let X(t) be a zero-mean random process with autocovariance 


Rx(t) = oe tl, 


(a) Write the eigenvalue integral equation for the Karhunen-Loeve expansion of X(t) 
on the interval [—T, T]. 


(b) Differentiate the above integral equation to obtain the differential equation 
2 
efa = 22) 
a a 
— ¢(t) = — e(t). 
(0) ow) 


(c) Show that the solutions to the above differential equation are of the form 
(t) = A cos bt and ¢(t) = B sin bt. Find an expression for b. 
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(d) Substitute the (t) from part c into the integral equation of part a to show that if 
p(t) = A cos bt, then b is the root of tan bT = a/b, and if (t) = B sin bt, then b is 
the root of tan bT = —b/a. 

(e) Find the values of A and B that normalize the eigenfunctions. 

*(f) In order to show that the frequencies of the eigenfunctions are not harmonically re- 
lated, plot the following three functions versus bT: tan bT, bT/aT, —aT/bT. The in- 
tersections of these functions yield the eigenvalues. Note that there are two roots per 
interval of length 7. 


*Section 9.10: Generating Random Processes 


9.118. 


9.119. 
9.120. 


9.121. 


9.122. 


9.123. 


9.124. 


(a) Generate 10 realizations of the binomial counting process with p = 1/4, p = 1/2, 
and p = 3/4. For each value of p, plot the sample functions for n = 200 trials. 

(b) Generate 50 realizations of the binomial counting process with p = 1/2. Find the 
sample mean and sample variance of the realizations for the first 200 trials. 

(c) In part b, find the histogram of increments in the process for the interval [1, 50], 
[51, 100], [101, 150], and [151, 200]. Compare these histograms to the theoretical 
pmf. How would you check to see if the increments in the four intervals are 
stationary? 

(d) Plot ascattergram of the pairs consisting of the increments in the interval [1,50] and 
[51, 100] in a given realization. Devise a test to check whether the increments in the 
two intervals are independent random variables. 

Repeat Problem 9.118 for the random walk process with the same parameters. 

Repeat Problem 9.118 for the sum process in Eq. (9.24) where the X, are iid unit-variance 

Gaussian random variables with mean: m = 0; m = 0.5. 

Repeat Problem 9.118 for the sum process in Eq. (9.24) where the X,, are iid Poisson ran- 

dom variables with a = 1. 

Repeat Problem 9.118 for the sum process in Eq. (9.24) where the X,, are iid Cauchy ran- 

dom variables with a = 1. 

Let Y,, = aY,,_; + X, where Yọ = 0. 

(a) Generate five realizations of the process for a = 1/4, 1/2, 9/10 and with X,, given by 
the p = 1/2 and p = 1/4 random step process. Plot the sample functions for the first 
200 steps. Find the sample mean and sample variance for the outcomes in each real- 
ization. Plot the histogram for outcomes in each realization. 

(b) Generate 50 realizations of the process Y,, with a = 1/2, p = 1/4, and p = 1/2. Find 
the sample mean and sample variance of the realizations for the first 200 trials. Find 
the histogram of Y „ across the realizations at times n = 5,n = 50, and n = 200. 

(c) In part b, find the histogram of increments in the process for the interval [1, 50], [51, 
100], [101, 150], and [151, 200]. To what theoretical pmf should these histograms be 
compared? Should the increments in the process be stationary? Should the incre- 
ments be independent? 

Repeat Problem 9.123 for the sum process in Eq. (9.24) where the X „ are iid unit-variance 

Gaussian random variables with mean: m = 0; m = 0.5. 


9.125. 


9.126. 


9.127. 


9.128. 


9.129. 


9.130. 
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(a) Propose a method for estimating the covariance function of the sum process in 
Problem 9.118. Do not assume that the process is wide-sense stationary. 

(b) How would you check to see if the process is wide-sense stationary? 

(c) Apply the methods in parts a and b to the experiment in Problem 9.118b. 

(d) Repeat part c for Problem 9.123b. 

Use the binomial process to approximate a Poisson random process with arrival rate 

à = 1 customer per second in the time interval (0, 100]. Try different values of n and 

come up with a recommendation on how n should be selected. 

Generate 100 repetitions of the experiment in Example 9.21. 

(a) Find the relative frequency of the event P[N(10) = 3 and N(60) — N(45) = 2] 
and compare it to the theoretical probability. 

(b) Find the histogram of the time that elapses until the second arrival and compare it to 
the theoretical pdf. Plot the empirical cdf and compare it to the theoretical cdf. 
Generate 100 realizations of the Poisson random process M(t) with arrival rate A = 1 
customer per second in the time interval (0, 10]. Generate the pair (N;(t), N2(t)) by as- 
signing arrivals in M(t) to Nj(t) with probability p = 0.25 and to N2(t) with probability 

0.75. 

(a) Find the histograms for N,(10) and N(10) and compare them to the theoretical pmf 
by performing a chi-square goodness-of-fit test at a 5% significance level. 

(b) Perform a chi-square goodness-of-fit test to test whether N,(10) and N,(10) are in- 
dependent random variables. How would you check whether N,(t) and N,(t) are 
independent random processes? 

Subscribers log on to a system according to a Poisson process with arrival rate A = 1 cus- 

tomer per second. The ith customer remains logged on for a random duration of T; sec- 

onds, where the 7; are iid random variables and are also independent of the arrival times. 

(a) Generate the sequence S, of customer arrival times and the corresponding 
departure times given by D, = S, + T,,, where the connections times are all equal 
to 1. 

(b) Plot: A(t), the number of arrivals up to time t; D(t), the number of departures up to 
time 4 and N(t) = A(t) — D(t), the number in the system at time t. 

(c) Perform 100 simulations of the system operation for a duration of 200 seconds. As- 
sume that customer connection times are an exponential random variables with mean 
5 seconds. Find the customer departure time instants and the associated departure 
counting process D(t). How would you check whether D(t) is a Poisson process? Find 
the histograms for D(t) and the number in the system M(t) att = 50, 100, 150, 200. Try 
to fit a pmf to each histogram. 

(d) Repeat part c if customer connection times are exactly 5 seconds long. 

Generate 100 realizations of the Wiener process with a = 1 for the interval (0, 3.5) using 

the random walk limiting procedure. 

(a) Find the histograms for increments in the intervals (0, 0.5], (0.5, 1.5], and (1.5, 3.5] 
and compare these to the theoretical pdf. 

(b) Perform a test at a5% significance level to determine whether the increments in the 
first two intervals are independent random variables. 
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9.131. 
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Repeat Problem 9.130 using Gaussian-distributed increments to generate the Wiener 
process. Discuss how the increment interval in the simulation should be selected. 


Problems Requiring Cumulative Knowledge 


9.132. 


9.133. 


9.134. 


9.135. 


Let X(t) be a random process with independent increments. Assume that the increments 
X(t) — X(t,) are gamma random variables with parameters A > 0 anda = t — fy. 


(a) Find the joint density function of X (tı) and X (t2). 
(b) Find the autocorrelation function of X(t). 

(c) Is X(t) mean square continuous? 

(d) Does X(t) have a mean square derivative? 


Let X(t) be the pulse amplitude modulation process introduced in Example 9.38 with 
T = 1.A phase-modulated process is defined by 


Y(t) = a cos( 2 ¥ Txo) 


(a) Plot the sample function of Y(t) corresponding to the binary sequence 0010110. 
(b) Find the joint pdf of Y(t,) and Y(t). 

(c) Find the mean and autocorrelation functions of Y(t). 

(d) Is Y(t) a stationary, wide-sense stationary, or cyclostationary random process? 
(e) Is Y(t) mean square continuous? 


(f) Does Y(t) have a mean square derivative? If so, find its mean and autocorrelation 
functions. 


Let N(t) be the Poisson process, and suppose we form the phase-modulated process 
Y(t) = acos(2aft + wN(t)). 


(a) Plot a sample function of Y(t) corresponding to a typical sample function of N(¢). 

(b) Find the joint density function of Y(t,) and Y (t2). Hint: Use the independent incre- 
ments property of N(¢). 

(c) Find the mean and autocorrelation functions of Y(t). 

(d) Is Y(t) a stationary, wide-sense stationary, or cyclostationary random process? 

(e) Is Y(t) mean square continuous? 


(f) Does Y(t) have a mean square derivative? If so, find its mean and autocorrelation 
functions. 


Let X(t) be a train of amplitude-modulated pulses with occurrences according to a Pois- 
son process: 


X(t) = ZA = Sk), 


where the A, are iid random variables, the S, are the event occurrence times in a Poisson 
process, and h(t) is a function of time. Assume the amplitudes and occurrence times are 
independent. 


(a) Find the mean and autocorrelation functions of X(t). 
(b) Evaluate part a when h(t) = u(t), a unit step function. 
(c) Evaluate part a when h(t) = p(t), a rectangular pulse of duration T seconds. 


9.136. 


9.137. 


9.138. 


9.139. 
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Consider a linear combination of two sinusoids: 
X(t) = A, cos(wot + 01) + Ao cos( V 2wot + 0), 


where ©; and O, are independent uniform random variables in the interval (0, 277), and 
A, and A; are jointly Gaussian random variables. Assume that the amplitudes are inde- 
pendent of the phase random variables. 


(a) Find the mean and autocorrelation functions of X(t). 

(b) Is X(t) mean square periodic? If so, what is the period? 

(c) Find the joint pdf of X(t,) and X (t2). 

(a) A Gauss-Markov random process is a Gaussian random process that is also a Markov 
process. Show that the autocovariance function of such a process must satisfy 


Cx(t3, t2)Cx(t2, t1) 
Cx(to, t2) : 


Cx(t3,4) = 


where t = h = tz. 

(b) It can be shown that if the autocovariance of a Gaussian random process satisfies 
the above equation, then the process is Gauss-Markov. Is the Wiener process Gauss- 
Markov? Is the Ornstein-Uhlenbeck process Gauss-Markov? 


Let A, and B, be two independent stationary random processes. Suppose that A,, and B,, 
are zero-mean, Gaussian random processes with autocorrelation functions 


Ralk) = oip!) Rg(k) = opi 


A block multiplexer takes blocks of two from the above processes and interleaves them 
to form the random process Y „: 


A A>B, By A3A4B3ByAsAcBsBg...- 


(a) Find the autocorrelation function of Y „. 

(b) Is Y,,, cyclostationary? wide-sense stationary? 

(c) Find the joint pdf of Y „ and Y „+1. 

(d) Let Zm = Y +7, where T is selected uniformly from the set {0,1, 2,3}. Repeat 
parts a, b, and c for Zm. 


Let A, be the Gaussian random process in Problem 9.138. A decimator takes every other 
sample to form the random process V „: 


A, A3A5A7 AAI 


(a) Find the autocorrelation function of V,,,. 

(b) Find the joint pdf of V „ and V,, +x. 

(c) An interpolator takes the sequence V,,, and inserts zeros between samples to form 
the sequence W;,: 


A,0A3;0A;0A70A0A1.... 


Find the autocorrelation function of W. Is W, a Gaussian random process? 
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9.140. Let A,, be a sequence of zero-mean, unit-variance independent Gaussian random variables. 
A block coder takes pairs of A’s and linearly transforms them to form the sequence Y „: 


Yount V2 1 -1 Aon+1 ` 


(a) Find the autocorrelation function of Y „. 
(b) Is Y,, stationary in any sense? 
(c) Find the joint pdf of Y„, Y „+1, and Y „+2. 


9.141. Suppose customer orders arrive according to a Bernoulli random process with parameter p. 
When an order arrives, its size is an exponential random variable with parameter A. Let S,, 
be the total size of all orders up to time n. 


(a) Find the mean and autocorrelation functions of S,,. 
(b) Is S,, a stationary random process? 

(c) Is S,, a Markov process? 

(d) Find the joint pdf of S,, and S,,4;. 


Analysis and 
Processing of Random 


CHAPTER 


Signals 


10.1 


In this chapter we introduce methods for analyzing and processing random signals. We 
cover the following topics: 


e Section 10.1 introduces the notion of power spectral density, which allows us to 
view random processes in the frequency domain. 

e Section 10.2 discusses the response of linear systems to random process inputs 
and introduce methods for filtering random processes. 

e Section 10.3 considers two important applications of signal processing: sampling 
and modulation. 

e Sections 10.4 and 10.5 discuss the design of optimum linear systems and intro- 
duce the Wiener and Kalman filters. 

e Section 10.6 addresses the problem of estimating the power spectral density of a 
random process. 


e Finally, Section 10.7 introduces methods for implementing and simulating the 
processing of random signals. 


POWER SPECTRAL DENSITY 


The Fourier series and the Fourier transform allow us to view deterministic time func- 
tions as the weighted sum or integral of sinusoidal functions. A time function that 
varies slowly has the weighting concentrated at the low-frequency sinusoidal compo- 
nents. A time function that varies rapidly has the weighting concentrated at higher-fre- 
quency components. Thus the rate at which a deterministic time function varies is 
related to the weighting function of the Fourier series or transform. This weighting 
function is called the “spectrum” of the time function. 

The notion of a time function as being composed of sinusoidal components is also 
very useful for random processes. However, since a sample function of a random 
process can be viewed as being selected from an ensemble of allowable time functions, 
the weighting function or “spectrum” for a random process must refer in some way to 
the average rate of change of the ensemble of allowable time functions. Equation 
(9.66) shows that, for wide-sense stationary processes, the autocorrelation function 
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Rx(7) is an appropriate measure for the average rate of change of a random process. 
Indeed if a random process changes slowly with time, then it remains correlated with it- 
self for a long period of time, and Ry(r) decreases slowly as a function of r. On the 
other hand, a rapidly varying random process quickly becomes uncorrelated with itself, 
and Ry(7) decreases rapidly with 7. 

We now present the Einstein-Wiener-Khinchin theorem, which states that the 
power spectral density of a wide-sense stationary random process is given by the Fouri- 
er transform of the autocorrelation function.! 


Continuous-Time Random Processes 


Let X(t) be a continuous-time WSS random process with mean my and autocorrela- 
tion function Ry(7). Suppose we take the Fourier transform of a sample of X(t) in the 
interval 0 < t < T as follows 


T 
X(f) = I X(t')e Pi dt. (10.1) 


We then approximate the power density as a function of frequency by the function: 


z 2- 1x x oe r ne iaf" a 1) oidaft | 
BOP = ANF) Hf xe) rat a hy f XUP de), 


(10.2) 


where * denotes the complex conjugate. X(t) is a random process, so Pr(f) is also a 
random process but over a different index set. p7(f) is called the periodogram esti- 
mate and we are interested in the power spectral density of X(t) which is defined by: 


: Š Pare EE 
Sx(f) = lim E[Pr(f)] = lim = ELIE(A)P I. (10.3) 
We show at the end of this section that the power spectral density of X(t) is given by the 
Fourier transform of the autocorrelation function: 


CoO 


Sy(f) = ARx(r)} = f Ry(r)e PF" dr. (10.4) 


-09 


A table of Fourier transforms and its properties is given in Appendix B. 
For real-valued random processes, the autocorrelation function is an even 
function of 7: 
Rx(t) = Rx(~7). (10.5) 


This result is usually called the Wiener-Khinchin theorem, after Norbert Wiener and A. Ya. Khinchin, who 
proved the result in the early 1930s. Later it was discovered that this result was stated by Albert Einstein in a 
1914 paper (see Einstein). 
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Substitution into Eq. (10.4) implies that 
Sy(f) = ii Ry(7){cos 2af7 — j sin2af7} dr 


= | Ry(T) cos 2a ft dr, (10.6) 


since the integral of the product of an even function (Rx(7)) and an odd function 
(sin 27f7) is zero. Equation (10.6) implies that Sx(f) is real-valued and an even func- 
tion of f. From Eq. (10.2) we have that Sy(f) is nonnegative: 


Sy(f) =0 for all f. (10.7) 


The autocorrelation function can be recovered from the power spectral density 
by applying the inverse Fourier transform formula to Eq. (10.4): 


Rx(t) = F '{Sx(f)} 


f Sx( fje? df. (10.8) 


Equation (10.8) is identical to Eq. (4.80), which relates the pdf to its corresponding 
characteristic function. The last section in this chapter discusses how the FFT can be 
used to perform numerical calculations for Sy(f) and R(T). 

In electrical engineering it is customary to refer to the second moment of X(t) as 
the average power of X(t)? Equation (10.8) together with Eq. (9.64) gives 


CO 


FLX] = Rx) = f Self) df. (109) 
Equation (10.9) states that the average power of X(t) is obtained by integrating Sx(f) 
over all frequencies. This is consistent with the fact that S'y(f) is the “density of power” 
of X(t) at the frequency f. 

Since the autocorrelation and autocovariance functions are related by Ry(7) = 
Cx(r) + m, the power spectral density is also given by 


Sy(f) = F{Cx(T) + my} 
= F{Cx(r)} + my &(f), (10.10) 


where we have used the fact that the Fourier transform of a constant is a delta func- 
tion. We say the my is the “dc” component of X(t). 

The notion of power spectral density can be generalized to two jointly wide-sense 
stationary processes. The cross-power spectral density Sy y(f) is defined by 


Sxy(f) = ARxy(7)}, (10.11) 


7Tf X(t) is a voltage or current developed across a 1-ohm resistor, then X°(t) is the instantaneous power ab- 
sorbed by the resistor. 
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Sy(f) 4 
1 


FIGURE 10.1 
Power spectral density of a random telegraph signal with a = 1 and 
a = 2 transitions per second. 


where Ry y(7) is the cross-correlation between X(t) and Y(t): 
Ryy(t) = EL X(t + 7)Y(0)]. (10.12) 


In general, Sy y(f) is a complex function of f even if X(t) and Y(t) are both real-valued. 


Example 10.1 Random Telegraph Signal 


Find the power spectral density of the random telegraph signal. 
In Example 9.24, the autocorrelation function of the random telegraph process was 
found to be 
Rx(7) = e°, 


where a is the average transition rate of the signal. Therefore, the power spectral density of the 
process is 


0 o0 
Sx(f) = / ete i2aft dr + | e Te TfT dr 
= 0 


CO 


1 1 
2a — j2af " 2a + jaf 


4a 
=—3— 10.13 
do? + Arf? (10.13) 
Figure 10.1 shows the power spectral density for a = 1 and a = 2 transitions per second. The 
process changes two times more quickly when «œ = 2; it can be seen from the figure that the 
power spectral density for a = 2 has greater high-frequency content. 


Example 10.2 Sinusoid with Random Phase 


Let X(t) = acos(27fot + ©), where © is uniformly distributed in the interval (0, 277). Find 
Sx(f). 


Section 10.1 


From Example 9.10, the autocorrelation for X(t) is 
2 


Rx(t) = Foos 2T for. 
Thus, the power spectral density is 


Pe 
Sx(f) = z Ticos 2r for} 


2 2 


oll ~ fo) + {ACF + fo) 


Power Spectral Density 581 


(10.14) 


where we have used the table of Fourier transforms in Appendix B. The signal has average power 
Ry(0) = a’/2. All of this power is concentrated at the frequencies + fy, so the power density at 


these frequencies is infinite. 


Example 10.3 White Noise 


The power spectral density of a WSS white noise process whose frequency components are lim- 
ited to the range -W = f = W is shown in Fig. 10.2(a). The process is said to be “white” in anal- 
ogy to white light, which contains all frequencies in equal amounts. The average power in this 


Sy(f) 4 
Nol2 
-W W 
(a) 
Ry(7)$ 
NW 


4 3 2 1 0 1 2 3 


4 


2W 2W 2W 2W 2W 2W 2W 


(b) 


FIGURE 10.2 


2W 


Bandlimited white noise: (a) power spectral density, (b) autocorrelation 


function. 
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process is obtained from Eq. (10.9): 
w 
2 No 
E[X*(t)] = 5 af = NW. (10.15) 
-W 


The autocorrelation for this process is obtained from Eq. (10.8): 


i W 
Rel) = 5M f eP% af 
1 e awe ex el2aWr 
> 
j2at 
M sin(27W,) (10.16) 


2TT 


Rx(rT) is shown in Fig. 10.2(b). Note that X(t) and X(t + 7) are uncorrelated at rT = +k/2W, 
k =1,2,.... 

The term white noise usually refers to a random process W(t) whose power spectral densi- 
ty is No/2 for all frequencies: 


N 
Sw(f) = a. for all f. (10.17) 


Equation (10.15) with W = œ shows that such a process must have infinite average power. By tak- 

ing the limit W — co in Eq. (10.16), we find that the autocorrelation of such a process approaches 
No 

Ry(t) = z (7). (10.18) 


If W(t) is a Gaussian random process, we then see that W(t) is the white Gaussian noise process 
introduced in Example 9.43 with a = N,/2. 


Example 10.4 Sum of Two Processes 


Find the power spectral density of Z(t) = X(t) + Y(t), where X(t) and Y(t) are jointly WSS 
processes. 
The autocorrelation of Z(t) is 


Rz(1) = E[Z(t + 7)Z(t)] = E[(X(t + 7) + Y(t + 7))( X(t) + Y(t))] 
= Ry(t) + Ryx(T) + Ryy(t) + Ry(7). 
The power spectral density is then 
Sz(f) = ARx(t) + Ryx(7) + Rxy(T) + Ry(7)} 
= Sx(f) + Syx(f) + Sxv(f) + Sy(f)- (10.19) 


Example 10.5 


Let Y(t) = X(t — d), where d is a constant delay and where X(t) is WSS. Find Ryy(7), 
Syx(f), Ry(7), and Sy(f). 


10.1.2 


Section 10.1 Power Spectral Density 583 


The definitions of Ry (rT), Syx(f), and Ry(r) give 
Ryx(t) = E[Y(t + 7)X(t)] = ELX(t + 7 — d)X(t)] = Rx(7 — d). (10.20) 
The time-shifting property of the Fourier transform gives 
Syx(f) = ARx(7 — d)} = Sx fe P? 
= Sx(f) cos(2afd) — jSy(f) sin(27fd). (10.21) 
Finally, 
Ry(t) = E[Y(t + T)Y(t)] = ELX(t + + — d)X(t — d)] = Ry(7). (10.22) 


Equation (10.22) implies that 
Sy(f) = FARy(7)} = ARx(7)} = Sx(f)- (10.23) 
Note from Eq. (10.21) that the cross-power spectral density is complex. Note from Eq. (10.23) 


that Sy(f) = Sy(f) despite the fact that X(t) # Y(t). Thus, Sy(f) = Sy(f) does not imply that 
X(t) = Y(t). 


Discrete-Time Random Processes 


Let X,, be a discrete-time WSS random process with mean my and autocorrelation 
function Ry(k). The power spectral density of X, is defined as the Fourier transform of 
the autocorrelation sequence 


Sx(f) = AARx(k)} 


= Dd Ry(kye Ph (10.24) 
k=-0o 
Note that we need only consider frequencies in the range —1/2 < f = 1/2, since Sy(f) 
is periodic in f with period 1. As in the case of continuous random processes, Sy(f) can 
be shown to be a real-valued, nonnegative, even function of f. 
The inverse Fourier transform formula applied to Eq. (10.23) implies that? 


1/2 
Rx(k) = j r Sx( fje” df. (10.25) 


Equations (10.24) and (10.25) are similar to the discrete Fourier transform. In the last 
section we show how to use the FFT to calculate Sy(f) and Ry(k). 

The cross-power spectral density Sy y(f) of two jointly WSS discrete-time 
processes X, and Y, is defined by 


Sxy(f) = ARyy(k)}, (10.26) 
where Ry y(k) is the cross-correlation between X, and Y,: 
Ry y(k) = E| Xn+k¥n]- (10.27) 


3You can view Ry(k) as the coefficients of the Fourier series of the periodic function S(f). 
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Example 10.6 White Noise 


Let the process X,, be a sequence of uncorrelated random variables with zero mean and variance 
ox. Find Sx(f). 
The autocorrelation of this process is 


2 


o k=0 
Gy i k#0. 


The power spectral density of the process is found by substituting Ry(k) into Eq. (10.24): 
1 1 
Sy(f) = 0% Tigh f< a (10.28) 


Thus the process X,, contains all possible frequencies in equal measure. 


Example 10.7 Moving Average Process 
Let the process Y,, be defined by 
Y, = X, + aXp-1, (10.29) 


where X, is the white noise process of Example 10.6. Find Sy(f). 
It is easily shown that the mean and autocorrelation of Y,, are given by 


E[Y,] = 0, 
and 


ELY, Yn] = $ a0% k= +1 (10.30) 
0 otherwise. 


The power spectral density is then 
Sy(f) = (1 + )ok + avx{er"h + ePi) 
= o%{(1 + a’) + 2a cos 2r f}. (10.31) 


Sy(f) is shown in Fig. 10.3 for a = 1. 


Example 10.8 Signal Plus Noise 


Let the observation Z, be given by 
Zn = Xn + Yp, 


where X, is the signal we wish to observe, Y, is a white noise process with power oy, and X,, and 
Y,, are independent random processes. Suppose further that X, = A for all n, where A is a ran- 
dom variable with zero mean and variance oå. Thus Z,, represents a sequence of noisy measure- 
ments of the random variable A. Find the power spectral density of Z,,. 

The mean and autocorrelation of Z,, are 


10.1.3 
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Sy(f) + 


2 
4oy 


FIGURE 10.3 
Power spectral density of moving average process discussed in Example 10.7. 


and 
E| ZpZn+k] = E| (Xn + Yn)(Xn+k + Yn+x)] 
= ELXXn+k] + E[X,]E[ Yn] 
+ EL Xn JELYn] + ELYAYn +4] 
= E[A’] + Ry(k). 


Thus Z,, is also a WSS process. 
The power spectral density of Z, is then 


Sz(f) = E[A8(f) + Sy(f), 


where we have used the fact that the Fourier transform of a constant is a delta function. 


Power Spectral Density as a Time Average 


In the above discussion, we simply stated that the power spectral density is given as the 
Fourier transform of the autocorrelation without supplying a proof. We now show how 
the power spectral density arises naturally when we take Fourier transforms of realiza- 
tions of random processes. 

Let Xp,..., X,_; be k observations from the discrete-time, WSS process X,,. Let 
X(f) denote the discrete Fourier transform of this sequence: 


k-1 
af) = Xe. (10.32) 
m=0 
Note that %;,(f) is a complex-valued random variable. The magnitude squared of X,(f) is 
a measure of the “energy” at the frequency f. If we divide this energy by the total “time” k, 
we obtain an estimate for the “power” at the frequency f: 


Pf) = ZAP. (10.33) 


D(f) is called the periodogram estimate for the power spectral density. 
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Consider the expected value of the periodogram estimate: 


R ET 
E(P(f)] = k E(X f)Xk(f)] 
1 
= ras He X ra) xe" | 

k-1k-1 

si D DEI XX le -j2rf (m-i) 
k mz 0 i= 

i k-1k-1 : . 

= r2 > Re (m — ije Palm i), (10.34) 
k f= 0i= 


Figure 10.4 shows the range of the double summation in Eq. (10.34). Note that all the terms 
along the diagonal m’ = m — i are equal, that m’ ranges from —(k — 1) to k — 1, 
and that .here are k — |m’| terms along the diagonal m’ = m — i. Thus Eq. (10.34) be- 
comes 


ERO = 5 {0 = Int) Rolo yer 


(10.35) 
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Comparison of Eq. (10.35) with Eq. (10.24) shows that the mean of the periodogram 
estimate is not equal to Sx(f) for two reasons. First, Eq. (10.34) does not have the term 
in brackets in Eq. (10.25). Second, the limits of the summation in Eq. (10.35) are not 
+00, We say that p;,(f) is a “biased” estimator for Sy(f). However, as k — œ, we see 


FIGURE 10.4 
Range of summation in Eq. (10.34). 


10.2 


10.2.1 
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that the term in brackets approaches one, and that the limits of the summation approach 
+00. Thus 


E[P(f)]>Sx(f) ask >, (10.36) 


that is, the mean of the periodogram estimate does indeed approach Sy(f). Note 
that Eq. (10.36) shows that Sy(f) is nonnegative for all f, since ,(f) is nonnegative 
for all f. 

In order to be useful, the variance of the periodogram estimate should also ap- 
proach zero. The answer to this question involves looking more closely at the problem 
of power spectral density estimation. We defer this topic to Section 10.6. 

All of the above results hold for a continuous-time WSS random process X(t) 
after appropriate changes are made from summations to integrals. The periodogram 
estimate for Sy(f), for an observation in the interval 0 < t < T, was defined in Eq. 
10.2. The same derivation that led to Eq. (10.35) can be used to show that the mean of 
the periodogram estimate is given by 


T 
E(pr(f)] = J H es ab Ryle dr. (10.37a) 
It then follows that 
E[Pr(f)] > Sx(f) asT=> œ. (10.37b) 


RESPONSE OF LINEAR SYSTEMS TO RANDOM SIGNALS 


Many applications involve the processing of random signals (i.e., random processes) 
in order to achieve certain ends. For example, in prediction, we are interested in pre- 
dicting future values of a signal in terms of past values. In filtering and smoothing, we 
are interested in recovering signals that have been corrupted by noise. In modulation, 
we are interested in converting low-frequency information signals into high-frequen- 
cy transmission signals that propagate more readily through various transmission 
media. 

Signal processing involves converting a signal from one form into another. Thus a 
signal processing method is simply a transformation or mapping from one time func- 
tion into another function. If the input to the transformation is a random process, then 
the output will also be a random process. In the next two sections, we are interested in 
determining the statistical properties of the output process when the input is a wide- 
sense stationary random process. 


Continuous-Time Systems 


Consider a system in which an input signal x(t) is mapped into the output signal y(t) by 
the transformation 


y(t) = T[x(t)]. 


The system is linear if superposition holds, that is, 


Tlax,(t) + Bx2(t)] = aT[xy(t)] + BT[x2(t)], 
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where x,(t) and x,(t) are arbitrary input signals, and « and £ are arbitrary constants.* 
Let y(t) be the response to input x(t), then the system is said to be time-invariant if the 
response to x(t — 7) is y(t — 7). The impulse response /(¢) of a linear, time-invariant 
system is defined by 
A(t) = T[8(t)] 


where ô(t) is a unit delta function input applied at t = 0. The response of the system to 
an arbitrary input x(t) is then 


CO 


y(t) = A(t) *x(t) = J rox — s) ds = / h(t — s)x(s) ds. (10.38) 


Therefore a linear, time-invariant system is completely specified by its impulse re- 
sponse. The impulse response A(t) can also be specified by giving its Fourier transform, 
the transfer function of the system: 


H(f) = F{h(t)} = J heP dt. (10.39) 


A system is said to be causal if the response at time t depends only on past values of the 
input, that is, if h(t) = 0 fort < 0. 

If the input to a linear, time-invariant system is a random process X(t) as shown 
in Fig. 10.5, then the output of the system is the random process given by 


Y(t) = J h(s)X(t — s) ds = J h(t — s)X(s) ds. (10.40) 
We assume that the integrals exist in the mean square sense as discussed in Section 9.7. 
We now show that if X(t) is a wide-sense stationary process, then Y(t) is also wide- 
sense stationary.> 
The mean of Y(t) is given by 


X(t) 


At) + vn 


FIGURE 10.5 
A linear system with a random input 
signal. 


4For examples of nonlinear systems see Problems 9.11 and 9.56. 

SEquation (10.40) supposes that the input was applied at an infinite time in the past. If the input is applied at 
t = 0, then Y(t) is not wide-sense stationary. However, it becomes wide-sense stationary as the response 
reaches “steady state” (see Example 9.46 and Problem 10.29). 
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Now my = E[ X(t — 7)] since X(t) is wide-sense stationary, so 


E[Y(t)] = my | h(t) dr = myH(0), (10.41) 


—00 


where H(f) is the transfer function of the system. Thus the mean of the output Y(t) is 
the constant my = H(0)my. 
The autocorrelation of Y(t) is given by 


A [noxa — s) as [my x¢ Hr =r) ar | 
= oe. ~ s)X(t +7 — r)] ds dr 


E [wo r)Rx(t + s — r) ds dr, (10.42) 


E[Y(H)Y(t + 7)] 


where we have used the fact that X(t) is wide-sense stationary. The expression on the 
right-hand side of Eq. (10.42) depends only on 7. Thus the autocorrelation of Y(t) de- 
pends only on 7, and since the E[Y(t)] is a constant, we conclude that Y(t) is a wide- 
sense stationary process. 

We are now ready to compute the power spectral density of the output of a linear, 
time-invariant system. Taking the transform of Ry(7) as given in Eq. (10.42), we obtain 


Sy(f) = i Ry(r)e PF" dr 


af. [. [wo r)Ry(t + s — r)e P7 ds dr dr. 


Change variables, letting u = T + s — r: 


/ 7 / h(s)h(r)Ry(uje P73) ds dr du 


J h(s)e?7fs as | h(r)e?7fr ar | Ry(uje Pfu du 


= H"(f)H(f)Sx(f) 
IA(f)? Sx(f), (10.43) 


where we have used the definition of the transfer function. Equation (10.43) relates the 
input and output power spectral densities to the system transfer function. Note that 
Ry(7) can also be found by computing Eq. (10.43) and then taking the inverse Fourier 
transform. 

Equations (10.41) through (10.43) only enable us to determine the mean and au- 
tocorrelation function of the output process Y(t). In general this is not enough to de- 
termine probabilities of events involving Y(t). However, if the input process is a 


Sy(f) 
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Gaussian WSS random process, then as discussed in Section 9.7 the output process will 
also be a Gaussian WSS random process. Thus the mean and autocorrelation function 
provided by Eqs. (10.41) through (10.43) are enough to determine all joint pdf’s in- 
volving the Gaussian random process Y(t). 

The cross-correlation between the input and output processes is also of interest: 


= Rx(r) *h(7). (10.44) 


By taking the Fourier transform, we obtain the cross-power spectral density: 


Sy x(f) = A(f)Sx(f). (10.45a) 
Since Ry y(t) = Ry_x(—7), we have that 
Sxy(f) = Sy.x(f) = H'(f)Sx(f)- (10.45b) 


Example 10.9 Filtered White Noise 


Find the power spectral density of the output of a linear, time-invariant system whose input is a 
white noise process. 
Let X(t) be the input process with power spectral density 


N 
Sy(f) = for all f. 


The power spectral density of the output Y(t) is then 


M 


Sy(f) = IAPS (10.46) 


Thus the transfer function completely determines the shape of the power spectral density of the 
output process. 


Example 10.9 provides us with a method for generating WSS processes with arbi- 
trary power spectral density Sy( f). We simply need to filter white noise through a filter 
with transfer function H(f) = V Sy(f). In general this filter will be noncausal. We can 
usually, but not always, obtain a causal filter with transfer function H(f) such that 
Sy(f) = H(f)H"(f). For example, if Sy(f) is a rational function, that is, if it consists of 
the ratio of two polynomials, then it is easy to factor Sy(f) into the above form, as 
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shown in the next example. Furthermore any power spectral density can be approxi- 
mated by a rational function. Thus filtered white noise can be used to synthesize WSS 
random processes with arbitrary power spectral densities, and hence arbitrary autocor- 
relation functions. 


Example 10.10 Ornstein-Uhlenbeck Process 


Find the impulse response of a causal filter that can be used to generate a Gaussian random 
process with output power spectral density and autocorrelation function 


2 2 
Sy(f) = et app and = Ry(r) = 5~e 
This power spectral density factors as follows: 
1 1 2 
(a — j2af) (a + j2af) 


If we let the filter transfer function be H(f) = 1/(a@ + j27f), then the impulse response is 


Sy(f) = 


h(t) =e“ fort =Q, 


which is the response of a causal system. Thus if we filter white Gaussian noise with power spec- 
tral density g? using the above filter, we obtain a process with the desired power spectral density. 

In Example 9.46, we found the autocorrelation function of the transient response of this 
filter for a white Gaussian noise input (see Eq. (9.97a)). As was already indicated, when dealing 
with power spectral densities we assume that the processes are in steady state. Thus as £ —> ©o 
Eq. (9.97a) approaches Eq. (9.97b). 


Example 10.11 Ideal Filters 


Let Z(t) = X(t) + Y(t), where X(t) and Y(t) are independent random processes with power 
spectral densities shown in Fig. 10.6(a). Find the output if Z(t) is input into an ideal lowpass filter 
with transfer function shown in Fig. 10.6(b). Find the output if Z(t) is input into an ideal band- 
pass filter with transfer function shown in Fig. 10.6(c). 

The power spectral density of the output W(t) of the lowpass filter is 


Sw(f) = (HiP f) S xf) + HLASY) = Sx(f), 


since H; p(f) = 1 for the frequencies where Sy(f) is nonzero, and H; p(f) = 0 where Sy(f) is 
nonzero. Thus W(t) has the same power spectral density as X(t). As indicated in Example 10.5, 
this does not imply that W(t) = X(t). 

To show that W(t) = X(t), in the mean square sense, consider D(t) = W(t) — X(t). It is 
easily shown that 


Rp(t) = Rw(7) — Rwx(7) — Rew(7) + Rx(7). 
The corresponding power spectral density is 
Solf) = Sw(f) — Swx(f) — Sxw(f) + Sx(f) 


= [HAP SxS) — Hre(f)Sx(f) — Hie(f)Sx(f) + Sx) 
=0. 


592 Chapter 10 Analysis and Processing of Random Signals 


FIGURE 10.6 
(a) Input signal to filters is X(t) + Y(t), (b) lowpass filter, (c) bandpass filter. 


Therefore Rp(7) = 0 for all 7, and W(t) = X(t) in the mean square sense since 
E[(W(t) — X(t))’] = E[D*(#)] = Rp(0) = 0. 


Thus we have shown that the lowpass filter removes Y(t) and passes X(t). Similarly, the bandpass 
filter removes X(t) and passes Y(t). 


Example 10.12 
A random telegraph signal is passed through an RC lowpass filter which has transfer function 


ees 

B + jaf’ 
where 8 = 1/RC is the time constant of the filter. Find the power spectral density and autocor- 
relation of the output. 


A(f) 


10.2.2 
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In Example 10.1, the power spectral density of the random telegraph signal with transition 
rate a was found to be 


7 4a 
4a? + An f? 


Sx(f) 


From Eq. (10.43) we have 
_ B 4a 
Sy(f) (= + -e i | 


— 4ap? 1 1 
B — 4a? | 4a? + An’ f? B + An? f? : 


Ry(r) is found by inverting the above expression: 


1 


Bo — 42 {Beh =. 2aBe Pry, 


Ry(r) = 


Discrete-Time Systems 


The results obtained above for continuous-time signals also hold for discrete-time sig- 
nals after appropriate changes are made from integrals to summations. 

Let the unit-sample response /,, be the response of a discrete-time, linear, time- 
invariant system to a unit-sample input 6,;: 


1 n=0 
ôn, = 10.47 
ek p n# 0. ( ) 
The response of the system to an arbitrary input random process X, is then given by 
Y, = h,* X, = > h;X,-j = > hy—;X;. (10.48) 


Thus discrete-time, linear, time-invariant systems are determined by the unit-sample 
response h,,. The transfer function of such a system is defined by 


H(f) = S he t, (10.49) 


The derivation from the previous section can be used to show that if X, is a wide- 
sense stationary process, then Y, is also wide-sense stationary. The mean of Y, is given by 


j=—00 
The autocorrelation of Y, is given by 


j=—00j=—00 
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By taking the Fourier transform of Ry(k) it is readily shown that the power spectral 
density of Y, is 
Sy(f) = |A(f)Sx(f). (10.52) 


This is the same equation that was found for continuous-time systems. 

Finally, we note that if the input process X,, is a Gaussian WSS random process, 
then the output process Y, is also a Gaussian WSS random whose statistics are com- 
pletely determined by the mean and autocorrelation function provided by Eqs. (10.50) 
through (10.52). 


Example 10.13 Filtered White Noise 


Let X,, be a white noise sequence with zero mean and average power oy. If X, is the input to a 
linear, time-invariant system with transfer function H( f), then the output process Y, has power 
spectral density: 


Sy(f) = H) Pox. (10.53) 


Equation (10.53) provides us with a method for generating discrete-time ran- 
dom processes with arbitrary power spectral densities or autocorrelation func- 
tions. If the power spectral density can be written as a rational function of z = e°”? 
in Eq. (10.24), then a causal filter can be found to generate a process with the 
power spectral density. Note that this is a generalization of the methods presented 
in Section 6.6 for generating vector random variables with arbitrary covariance 
matrix. 


Example 10.14 First-Order Autoregressive Process 


A first-order autoregressive (AR) process Y,, with zero mean is defined by 

Yn = aY,-1 + Xn, (10.54) 
where X,, is a zero-mean white noise input random process with average power oy. Note that Y, 
can be viewed as the output of the system in Fig. 10.7(a) for an iid input X,,. Find the power spec- 


tral density and autocorrelation of Y,,. 
The unit-sample response can be determined from Eq. (10.54): 


0 n<0O 
h,= 51 n=0 
a” n> 0. 


Note that we require |a| < 1 for the system to be stable. Therefore the transfer function is 


p= 
1 -— ae f 


H(f) = Sate Pe = 


n=0 


°A system is said to be stable if S\,,|h,| < 00. The response of a stable system to any bounded input is also 
bounded. 
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delay 
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FIGURE 10.7 


(a) Generation of AR process; (b) Generation of ARMA process. 


Equation (10.52) then gives 


ox 


(1 - ae 7S) (1 E ae!) 


Sy(f) = 


ox 


1+a@- (ae ?7f + ael”) 


ox 


1+ - 2a cos 2r f 


Equation (10.51) gives 


Example 10.15 ARMA Random Process 
An autoregressive moving average (ARMA) process is defined by 
q p 
Y, = -X Yni + X Ba: (10.55) 
ist 7=0 


where W, is a WSS, white noise input process. Y, can be viewed as the output of the recursive sys- 
tem in Fig. 10.7(b) to the input X,,. It can be shown that the transfer function of the linear system 
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defined by the above equation is 
p 
S gepi 
7=0 


q 4 M 
1+ Sage Pi 
i=1 


H(f) = 


The power spectral density of the ARMA process is 


Sy(f) = |H(f) Pow. 
ARMA models are used extensively in random time series analysis and in signal processing. The gen- 
eral autoregressive process is the special case of the ARMA process with 64 = By =: = B, = 0. 
The general moving average process is the special case of the ARMA process with a, = a) = ++: = 
a, = 0. Octave has a function filter (b, a, x) which takes a set of coefficients b = (81, B2,..., 
Bp+1) and a = (a, a2,..., œq) as coefficient for a filter as in Eq. (10.55) and produces the output 
corresponding to the input sequence x. The choice of a and b can lead to a broad range of discrete- 
time filters. 
For example, if we let a = (1/N,1/N,...,1/N) we obtain a moving average filter: 


Y, = (Wn H Wi-1 nae Wa-n+1)/N. 


Figure 10.8 shows a zero-mean, unit-variance Gaussian iid sequence W, and the outputs from an 
N = 3 andan N = 10 moving average filter. It can be seen that the N = 3 filter moderates the 
extreme variations but generally tracks the fluctuations in X,,. The N = 10 filter on the other 
hand severely limits the variations and only tracks slower longer-lasting trends. 


Figures 10.9(a) and (b) show the result of passing an iid Gaussian sequence X, 
through first-order autoregressive filters as in Eq. (10.54). The AR sequence with a = 0.1 
has low correlation between adjacent samples and so the sequence remains similar to the 
underlying iid random process. The AR sequence with a = 0.75 has higher correlation be- 
tween adjacent samples which tends to cause longer lasting trends as evident in Fig. 10.9(b). 


—3 } — 
4 | | | | l | | 
10 20 30 40 50 60 70 
FIGURE 10.8 


Moving average process showing iid Gaussian sequence and corresponding 
N = 3, N = 10 moving average processes. 
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FIGURE 10.9 


(a) First-order autoregressive process with a = 0.1; (b) with a = 0.75. 


10.3 


10.3.1 


BANDLIMITED RANDOM PROCESSES 


In this section we consider two important applications that involve random 
processes with power spectral densities that are nonzero over a finite range of fre- 
quencies. The first application involves the sampling theorem, which states that 
bandlimited random processes can be represented in terms of a sequence of their 
time samples. This theorem forms the basis for modern digital signal processing 
systems. The second application involves the modulation of sinusoidal signals by 
random information signals. Modulation is a key element of all modern communi- 
cation systems. 


Sampling of Bandlimited Random Processes 


One of the major technology advances in the twentieth century was the development 
of digital signal processing technology. All modern multimedia systems depend in 
some way on the processing of digital signals. Many information signals, e.g., voice, 
music, imagery, occur naturally as analog signals that are continuous-valued and that 
vary continuously in time or space or both. The two key steps in making these signals 
amenable to digital signal processing are: (1). Convert the continuous-time signals into 
discrete-time signals by sampling the amplitudes; (2) Representing the samples using a 
fixed number of bits. In this section we introduce the sampling theorem for wide-sense 
stationary bandlimited random processes, which addresses the conversion of signals 
into discrete-time sequences. 

_ Let x(t) be a deterministic, finite-energy time signal that has Fourier transform 
X(f) = #{x(t)} that is nonzero only in the frequency range |f| =< W. Suppose we sam- 
ple x(t) every T seconds to obtain the sequence of sample values: {..., x(—2T), x(—T), 
x(0), x(T),...}. The sampling theorem for deterministic signals states that x(t) can 
be recovered exactly from the sequence of samples if T = 1/2W or equivalently 
1/T = 2W, that is, the sampling rate is at least twice the bandwidth of the signal. 
The minimum sampling rate 1/2W is called the Nyquist sampling rate. The sampling 
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FIGURE 10.10 
(a) Sampling and interpolation; (b) Fourier transform of sampled 
deterministic signal; (c) Sampling, digital filtering, and interpolation. 


theorem provides the following interpolation formula for recovering x(t) from the 
samples: 


2 sin(at/T 
x(t) = D 0T) p(t — nT) where p(t) = ae 
Eq. (10.56) provides us with the interesting interpretation depicted in Fig. 10.10(a). 
The process of sampling x(t) can be viewed as the multiplication of x(t) by a train of delta 
functions spaced T seconds apart. The sampled function is then represented by: 


x,(t) = >) x(nT)8(t — nT). (10.57) 
Eq. (10.56) can be viewed as the response of a linear system with impulse response p(t) 
to the signal x,(t). It is easy to show that the p(t) in Eq. (10.56) corresponds to the ideal 
lowpass filter in Fig. 10.6: 


(10.56) 


P(f) = Fpl0)} = F oa 
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The proof of the sampling theorem involves the following steps. We show that 


a > x(n) p(t - m} = =P) > Xf - Š), (10.58) 


n=—00 k=—0o 


which consists of the sum of translated versions of X(f) = #{x(t)}, as shown in 
Fig. 10.10(b). We then observe that as long as 1/T = 2W, then P(f) in the above ex- 
pressions selects the k = 0 term in the summation, which corresponds to X(f). See 
Problem 10.45 for details. 


Example 10.16 Sampling a WSS Random Process 


Let X(t) be a WSS process with autocorrelation function Ry(7). Find the mean and covariance 
functions of the discrete-time sampled process X, = X(nT) for n = 0, +1, +2,.... 
Since X(t) is WSS, the mean and covariance functions are: 


my(n) = E[X(nT)| =m 
E| Xn Xn] = E[X(mT)X(mT)] = Rx(mT — mT) = Rx((n — m)T). 


This shows X, is a WSS discrete-time process. 


Let X(t) be a WSS process with autocorrelation function R(T) and power spec- 
tral density Sy(f). Suppose that Sy(f) is bandlimited, that is, 


Sx(f) =0 |f| >W. 
We now show that the sampling theorem can be extended to X(t). Let 


X(t) = 5 X(nT)p(t — nT) where p(t) = wee (10.59) 


then X (t) = X(t) in the mean square sense. Recall that equality in the mean square 
sense does not imply equality for all sample functions, so this version of the sampling 
theorem is weaker than the version in Eq. (10.56) for finite energy signals. 

To show Eq. (10.59) we first note that since Sy(f) = ¥{Ry(7)}, we can apply 
the sampling theorem for deterministic signals to Ry(rT): 


Rx(t) = > Relat) p(t — nT). (10.60) 
Next we consider the mean square error associated with Eq. (10.59): 
E[{X(t) — X(t) P] = EHX (0) — X()}X())] - EHX) - XO} XC] 
= {E[X(t)X(t)] — EL X(t) X(t)]} - 
{ELX(#)X()] - ELX(1)X(2)}}. 


It is easy to show that Eq. (10.60) implies that each of the terms in braces is equal to zero. 
(See Problem 10.48.) We then conclude that X(t) = X(t) in the mean square sense. 
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Example 10.17 Digital Filtering of a Sampled WSS Random Process 


Let X(t) be a WSS process with power spectral density Sy(f) that is nonzero only for |f| = W. 
Consider the sequence of operations shown in Fig. 10.10(c): (1) X(t) is sampled at the Nyquist rate; 
(2) the samples X(nT) are input into a digital filter in Fig. 10.7(b) with a; = ay = --- = ag = 0; 
and (3) the resulting output sequence Y, is fed into the interpolation filter. Find the power spectral 
density of the output Y(t). 

The output of the digital filter is given by: 


= 3,X( (k — n)T) 


and the corresponding autocorrelation from Eq. (10.51) is: 


= Š S 6,B:Rx((k +n- i)T). 


n=0i=0 


The autocorrelation of Y(t) is found from the interpolation formula (Eq. 10.60): 


Ry(T) = 5 Ry(kT)p(t — kT) = 5 S Se BnBiRx((k + n — i)T)p(t — kT) 


k=—00 k=—© n=0 i=0 
-> Seel S ri +n = iT )pr — in| 


p p 
= X DBaBRx(t + (n - i)T). 


The output power spectral density is then: 


Sy(f) = HRl} = D> BSF Ret + (n DT) 


z > 5 BB Sx(f)e PEO DT 


n=0i=0 
Z —j2 T 2 j2a fiT 
= |H(fT)|? Sx(f) (10.61) 


where H(f) is the transfer function of the digital filter as per Eq. (10.49). The key finding here is the 
appearance of H(f) evaluated at fT. We have obtained a very nice result that characterizes the over- 
all system response in Fig. 10.8 to the continuous-time input X(t). This result is true for more general 
digital filters, see [Oppenheim and Schafer]. 


The sampling theorem provides an important bridge between continuous-time 
and discrete-time signal processing. It gives us a means for implementing the real as well 
as the simulated processing of random signals. First, we must sample the random 
process above its Nyquist sampling rate. We can then perform whatever digital process- 
ing is necessary. We can finally recover the continuous-time signal by interpolation. The 
only difference between real signal processing and simulated signal processing is that 
the former usually has real-time requirements, whereas the latter allows us to perform 
our processing at whatever rate is possible using the available computing power. 
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10.3.2 Amplitude Modulation by Random Signals 


Many of the transmission media used in communication systems can be modeled as 
linear systems and their behavior can be specified by a transfer function H(f), which 
passes certain frequencies and rejects others. Quite often the information signal A(t) 
(i.e., a speech or music signal) is not at the frequencies that propagate well. The pur- 
pose of a modulator is to map the information signal A(t) into a transmission signal 
X(t) that is in a frequency range that propagates well over the desired medium. At the 
receiver, we need to perform an inverse mapping to recover A(t) from X(t). In this sec- 
tion, we discuss two of the amplitude modulation methods. 

Let A(t) be a WSS random process that represents an information signal. In gen- 
eral A(t) will be “lowpass” in character, that is, its power spectral density will be con- 
centrated at low frequencies, as shown in Fig. 10.11(a). An amplitude modulation 
(AM) system produces a transmission signal by multiplying A(t) by a “carrier” signal 
cos(27f.t + ©): 


X(t) = A(t) cos(27ft + 0), (10.62) 


where we assume © is a random variable that is uniformly distributed in the interval 
(0, 27r), and © and A(t) are independent. 
The autocorrelation of X(t) is 


E| X(t + 7)X(t)] 
= E| A(t + T) cos(2rf.(t + 7) + @)A(t) cos(27f,t + O)] 
= E| A(t + 7)A(t)]E[cos(27f.(t + r) + ©) cos(27f.t + O)] 


Sa(f) 


te 0 fe 
(b) 


FIGURE 10.11 
(a) A lowpass information signal; (b) an amplitude-modulated signal. 
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1 1 
= RA(r)E 3 cos(27 fcr) + 3 c0s(2a fe(2t +7) +20) 


= salt) cos(27f,.7), (10.63) 


where we used the fact that E[cos(27f,(2t + 7) + 20)] = 0 (see Example 9.10). Thus 
X(t) is also a wide-sense stationary random process. 
The power spectral density of X(t) is 


Sy(f) = PÍZ RAC) cos(2ai.z)} 


GSA + fe) + ESA = fe (10.64) 


where we used the table of Fourier transforms in Appendix B. Figure 10.11(b) shows 
Sy(f). It can be seen that the power spectral density of the information signal has been 
shifted to the regions around +f.. X(t) is an example of a bandpass signal. Bandpass 
signals are characterized as having their power spectral density concentrated about 
some frequency much greater than zero. 

The transmission signal is demodulated by multiplying it by the carrier signal and 
lowpass filtering, as shown in Fig. 10.12. Let 


Y(t) = X(t)2 cos(2r f.t + ©). (10.65) 


Proceeding as above, we find that 
1 1 
Sy(f) = 3 Sx(f + fe) + 3 Sx(f = fe) 


= ASAE + 2f) + SAPD} + 5 {SAAD + SAG = 2f) 


The ideal lowpass filter passes S4(f) and blocks S4(f + 2f), which is centered about 
+ f,so the output of the lowpass filter has power spectral density 


Sy(f) = Sa(f). 


In fact, from Example 10.11 we know the output is the original information signal, A(t). 


X() 


()—— LF ro 


| 


2 cos (27f,t + ©) 


FIGURE 10.12 
AM demodulator. 
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Sy(f) 


| | ` 


= 


0 
(b) 


ISga) 


0 
(c) 


FIGURE 10.13 
(a) A general bandpass signal. (b) a real-valued even function 
of f. (c) an imaginary odd function of f. 


The modulation method in Eq. (10.56) can only produce bandpass signals for 
which Sx(f) is locally symmetric about f., Sy(f. + 5f) = Sx(fe — 8f) for |&f| < W, 
as in Fig. 10.11(b). The method cannot yield real-valued transmission signals whose 
power spectral density lack this symmetry, such as shown in Fig. 10.13(a). The following 
quadrature amplitude modulation (QAM) method can be used to produce such signals: 


X(t) = A(t) cos(27f,t + ©) + B(t) sin(Qrf.t + ©), (10.66) 


where A(t) and B(f) are real-valued, jointly wide-sense stationary random processes, 
and we require that 


Ra(t) = Rg(T) (10.67a) 
Rg, A(T) = —Rap(t). (10.67b) 
Note that Eq. (10.67a) implies that S4(f) = Sp(f), a real-valued, even function of f, as 


shown in Fig. 10.13(b). Note also that Eq. (10.67b) implies that Sg 4(f) is a purely 
imaginary, odd function of f, as also shown in Fig. 10.13(c) (see Problem 10.57). 
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Proceeding as before, we can show that X(t) is a wide-sense stationary random 
process with autocorrelation function 


Ry(t) = Ra(r) cos(27f.7) + Rg A(T) sin(27f.7) (10.68) 


and power spectral density 


Salf) = FISAF = fd) + Salf + F} + gp (Seall ~ fe) ~ Snalf + f} (00.69) 


The resulting power spectral density is as shown in Fig. 10.13(a). Thus QAM can be 
used to generate real-valued bandpass signals with arbitrary power spectral density. 

Bandpass random signals, such as those in Fig. 10.13(a), arise in communication 
systems when wide-sense stationary white noise is filtered by bandpass filters. Let M(t) 
be such a process with power spectral density Sy(f). It can be shown that M(t) can be 
represented by 


N(t) = N.(t) cos(27f,t + ©) — N,(t) sin(2a7fit + ©), (10.70) 
where N.(t) and N,(t) are jointly wide-sense stationary processes with 
Su (f) = Sn (f) = {Swf — fe) + Swf + fob (10.71) 
and 
Sneon (f) = HSN = f) — SG + Se) hn (10.72) 


where the subscript L denotes the lowpass portion of the expression in brackets. In 
words, every real-valued bandpass process can be treated as if it had been generated by 
a QAM modulator. 


Example 10.18 Demodulation of Noisy Signal 
The received signal in an AM system is 
Y(t) = A(t) cos(27f.t + ©) + N(t), 


where N(t) is a bandlimited white noise process with spectral density 


N 

e |f + fal <W 
Sy(f) = 2 

0 elsewhere. 


Find the signal-to-noise ratio of the recovered signal. 
Equation (10.70) allows us to represent the received signal by 


Y(t) = {A(t) + N(t)} cos(27f.t + O) — N,(t) sin(2a7fit + O). 


The demodulator in Fig. 10.12 is used to recover A(t). After multiplication by 2 cos(27f,t + ©), 
we have 


2Y (t) cos(2r ft + ©) = {A(t) + N,(t)}2 cos?(2rf t + ©) 

t)2 cos(27f.t + ©) sin(27f.t + O) 
+ N.(t)}(1 + cos(4rf t + 20)) 
— N,(t) sin(4rf t + 20). 


10.4 
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After lowpass filtering, the recovered signal is 
A(t) + N,(t). 


The power in the signal and noise components, respectively, are 


OPTIMUM LINEAR SYSTEMS 
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Many problems can be posed in the following way. We observe a discrete-time, zero- 
mean process X, over a certain time interval J = {t — a,...,t + b}, and we are re- 
quired to use the a + b + 1 resulting observations {X;-,,..., X;,-.-, X+4,} to obtain 
an estimate Y, for some other (presumably related) zero-mean process Z,. The esti- 


mate Y, is required to be linear, as shown in Fig. 10.14: 


t+b a 
B=t-a B=-b 
The figure of merit for the estimator is the mean square error 
Ele7] = E[(Z, — ¥,)’], (10.74) 
Xa Xa tl! or X, Pet | Xi+b 


| 


FIGURE 10.14 
A linear system for producing an estimate Y+. 
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and we seek to find the optimum filter, which is characterized by the impulse response 
hg that minimizes the mean square error. 

Examples 10.19 and 10.20 show that different choices of Z, and X, and of obser- 
vation interval correspond to different estimation problems. 


Example 10.19 Filtering and Smoothing Problems 
Let the observations be the sum of a “desired signal” Z, plus unwanted “noise” N,: 
Xa = Za FN acel. 


We are interested in estimating the desired signal at time t. The relation between f and the ob- 
servation interval J gives rise to a variety of estimation problems. 

If J = (—©%, t), that is,a = œ and b = 0, then we have a filtering problem where we esti- 
mate Z, in terms of noisy observations of the past and present. If J = (t — a, t), then we have a 
filtering problem in which we estimate Z, in terms of the a + 1 most recent noisy observations. 

If J = (—o, œ), that is,a = b = œ, then we have a smoothing problem where we are at- 
tempting to recover the signal from its entire noisy version. There are applications where this 
makes sense, for example, if the entire realization X, has been recorded and the estimate Z, is 
obtained by “playing back” X4. 


Example 10.20 Prediction 


Suppose we want to predict Z, in terms of its recent past: {Z,_,,..., Z,-,}. The general estima- 
tion problem becomes this prediction problem if we let the observation X, be the past a values 
of the signal Z,, that is, 

Xa = Za t-asast-l. 


The estimate Y, is then a linear prediction of Z, in terms of its most recent values. 


The Orthogonality Condition 


It is easy to show that the optimum filter must satisfy the orthogonality condition (see 
Eq. 6.56), which states that the error e, must be orthogonal to all the observations X,,, that is, 


0 = Eļ[e,Xa] for all æ e I 


= El(Z, — ¥)Xq] = 0, (10.75) 
or equivalently, 
E| Z,Xa] = E[Y;Xa] forall ae J. (10.76) 


If we substitute Eq. (10.73) into Eq. (10.76) we find 
a 
E[Z,X,] = o] > zd for allae I 
B=-b 
a 
= 5 hgE| X,-pXq| 
B=-b 


= SigRx(t-a-B)  forallael. (10.77) 
B=-b 
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Equation (10.77) shows that E[Z,X,] depends only on t — a, and thus X, and Z, 
are jointly wide-sense stationary processes. Therefore, we can rewrite Eq. (10.77) as 
follows: 


a 
i a A p — a) t-axsazstt+b. 


Finally, letting m = t — a, we obtain the following key equation: 


The optimum linear filter must a the set of a + b + 1 linear equations given by 
Eq. (10.78). Note that Eq. (10.78) is identical to Eq. (6.60) for estimating a random 
variable by a linear combination of several random variables. The wide-sense station- 
arity of the processes reduces this estimation problem to the one considered in 
Section 6.5. 

In the above derivation we deliberately used the notation Z, instead of Z,, to sug- 
gest that the same development holds for continuous-time estimation. In particular, 
suppose we seek a linear estimate Y(t) for the continuous-time random process Z(t) in 
terms of observations of the continuous-time random process X (qa) in the time inter- 
valt-asast+b: 


t+b a 
yy= | nt- BX de= | MEXE- B) db: 


It can then be shown that the filter A( 6) that minimizes the mean square error is spec- 
ified by 


Rz x(t) = i h(B)Rx(7 — B) dB —bst=a. (10.79) 


b 


Thus in the time-continuous case we obtain an integral equation instead of a set of 
linear equations. The analytic solution of this integral equation can be quite diffi- 
cult, but the equation can be solved numerically by approximating the integral by a 
summation.’ 

We now determine the mean square error of the optimum filter. First we note 
that for the optimum filter, the error e, and the estimate Y, are orthogonal since 


EleY,] = Ee Z hepXo| = Š, h-gEle:Xg] = 0, 


where the terms inside the last summation are 0 because of Eq. (10.75). Since e, = Z, — Y,, 
the mean square error is then 
Ele7] = Ele(Z, — ¥,)] 
=E [ eZ] > 


TEquation (10.79) can also be solved by using the Karhunen-Loeve expansion. 
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since e, and Y, are orthogonal. Substituting for e, yields 


Ele7] = E| (Z — Y) Z] = E[Z,Z,] — E[Y,Z;] 


= R,(0) — > toz.x(B) (10.80) 


Similarly, it can be shown that the mean square error of the optimum filter in the 
continuous-time case is 


a 


Ele(t)] = Rz(0) = i h(B)Rz.x(B) dB. (10.81) 


b 


The following theorems summarize the above results. 


Theorem 


Let X, and Z, be discrete-time, zero-mean, jointly wide-sense stationary processes, and let Y, be 
an estimate for Z, of the form 
ttb 


= > h-Xp = Dex 6- 
B=t-a 
The filter that minimizes E[(Z, — mys satisfies the equation 


and has mean square error given ~ 


BU(Z, = Y?) = RAO) ~ $ heRzx(B). 


Theorem 


Let X(t) and Z(t) be continuous-time, zero-mean, jointly wide-sense stationary processes, and let 
Y(t) be an estimate for Z(t) of the form 


t+b a 
v= f na- pyxipyap = | nexa- pap 
The filter A( 6) that minimizes E[ (Z(t) — Y(t))?] satisfies the equation 


Rz x(T) F [ h(B)Rx(T — B) dB -bsr sa 


b 
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and has mean square error given by 


a 


E[(Z(t) - Y(1))?] = Rz(0) — f h(B)Rzx(B) dB. 


—b 


Example 10.21 Filtering of Signal Plus Noise 


Suppose we are interested in estimating the signal Z,, from the p + 1 most recent noisy obser- 
vations: 


Xa = Za t+ Ny ael = {n- p,...,n—1,n}. 


Find the set of linear equations for the optimum filter if Z, and N, are independent random 
processes. 
For this choice of observation interval, Eq. (10.78) becomes 


Rzx(m) = Sokal — B) me {0,1,..., p}. (10.82) 


The cross-correlation terms in Eq. (10.82) are given by 
Rz x(m) = E| Zp„pXn-m] = E| Zpa(Zn-m + Na-m)] = Rz(m). 
The autocorrelation terms are given by 
Rx(m = B) = EL X),—pXn—m] = E| (Zn-g + Nn—p)(Zn—m + Nn—m) | 
= Rz(m — B) + Rzn(m — B) 
+ Ry,z(m — B) + Ry(m — B) 
= Rz(m — B) + Ry(m — B), 


since Z, and N, are independent random processes. Thus Eq. (10.82) for the optimum filter be- 
comes 


Rz(m) = Sol Rem — B) + Rn(m— B)} me{0,1,..., p}. (10.83) 


This set of p + 1 linear equations in p + 1 unknowns hg is solved by matrix inversion. 


Example 10.22 Filtering of AR Signal Plus Noise 


Find the set of equations for the optimum filter in Example 10.21 if Z, is a first-order autore- 
gressive process with average power o% and parameter r, |r| < 1, and N, is a white noise process 
with average power ow. 


The autocorrelation for a first-order autoregressive process is given by 


msm = 0, +1, +2,.... 


Rz(m) = opr 


(See Problem 10.42.) The autocorrelation for the white noise process is 
Ry(m) = o% 8(m). 
Substituting Rz(m) and Ry(m) into Eq. (10.83) yields the following set of linear equations: 


P 
ozr"! = X hg(ozr" P + o%5(m — B)) me {0,..., p}. (10.84) 
B=0 
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If we divide both sides of Eq. (10.84) by a7 and let T = o%/a7, we obtain the following matrix 
equation: 


1+7 r r se rP ho 1 
r 1+r r ee pr oN Ry r 
72 5 ET eio pp? e l=]. (10.85) 
r? rP-t re? an aoe hp r? 


Note that when the noise power is zero, i.e., [ = 0, then the solution is ho = 1, A; = 0, 
j =1,..., p, that is, no filtering is required to obtain Z,,. 

Equation (10.85) can be readily solved using Octave. The following function will compute 
the optimum linear coefficients and the mean square error of the optimum predictor: 


function [mse]= Lin_Est_AR (order, rho, varsig, varnoise) 
n=[0:1:order-1] 

r=varsig*rho.“n; 

R=varnoise*xeye (order) +toeplitz(r) ; 

H=inv (R) «transpose (r) 

mse=varsig-transpose (H) *transpose(r) ; 


endfunction 


Table 10.1 gives the values of the optimal predictor coefficients and the mean square error as 
the order of the estimator is increased for the first-order autoregressive process with o% = 4,r = 0.9, 
and noise variance o% = 4. It can be seen that the predictor places heavier weight on more recent 
samples, which is consistent with the higher correlation of such samples with the current sample. For 
smaller values of r, the correlation for distant samples drops off more quickly and the coefficients 
place even lower weighting on them. The mean square error can also be seen to decrease with in- 
creasing order p + 1 of the estimator. Increasing the first few orders provides significant improve- 
ments, but a point of diminishing returns is reached around p + 1 = 3. 


Prediction 


The linear prediction problem arises in many signal processing applications. In 
Example 6.31 in Chapter 6, we already discussed the linear prediction of speech sig- 
nals. In general, we wish to predict Z,, in terms of Z,_1, Z,2,..-, Z 


n—p* 
p 
Yn _ S hgZn-g- 
B=1 


TABLE 10.1 Effect of predictor order on MSE performance. 


pti MSE Coefficients 
1 2.0000 0.5 
2 1.4922 0.37304 0.28213 
3 1.3193 0.32983 0.22500 0.17017 
4 1.2549 0.31374 0.20372 0.13897 0.10510 
5 1.2302 0.30754 0.19552 0.12696 0.08661 0.065501 
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For this problem, X, = Z,, so Eq. (10.79) becomes 


Rz(m) = $ hokelm — B) me {1,..., p}. (10.86a) 


In matrix form this equation becomes 


Rz(1) Rz(0) Rz(1) Rz(2) Rz(p — 1) hy 

Rz(2 Rz(1) Rz(0) Rz(1) Rz(p — 2) || hy 

; . ; F ; R,(1) i 

Rz(p) Rz(p — 1) l i Rz(1) Rz(0) hp 
= Rh. (10.86b) 


Equations (10.86a) and (10.86b) are called the Yule-Walker equations. 
Equation (10.80) for the mean square error becomes 


E[e}] = Rz(0) — S RAB). (10.87) 


By inverting the p X p matrix Rz, we can solve for the vector of filter coefficients A. 


Example 10.23 Prediction for Long-Range and Short-Range Dependent Processes 


Let X,(t) be a discrete-time first-order autoregressive process with o% = 1 andr = 0.7411, and 
let X,(t) be a discrete-time long-range dependent process with autocovariance given by Eq. 
(9.109), o% = 1, and H = 0.9. Both processes have Cy(1) = 0.7411, but the autocovariance of 
Xı(t) decreases exponentially while that of X2(t) has long-range dependence. Compare the per- 
formance of the optimal linear predictor for these processes for short-term as well as long-term 
predictions. 

The optimum linear coefficients and the associated mean square error for the long-range 
dependent process can be calculated using the following code. The function can be modified for 
the autoregressive case. 


function mse= Lin Pred_LR (order, Hurst,varsig) 
n=[0:1:order-1] 

H2=2*Hurst 

r=varsig* ((1+n) .*H2—2* (n.*H2)+abs(n—1) .*H2) /2 
rz=varsig* ((2+n) .*H2—2* ((n+1) .^H2)+ (n) .*H2) /2 
R=toeplitz(r); 
H=transpose (inv (R) *transpose (rz) ) 
mse=varsig-H«transpose (rz) 


endfunction 


Table 10.2 below compares the mean square errors and the coefficients of the two process- 
es in the case of short-term prediction. The predictor for X;(t) attains all of the benefit of pre- 
diction with a p = 1 system. The optimum predictors for higher-order systems set the other 
coefficients to zero, and the mean square error remains at 0.4577. The predictor for X(t) 
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TABLE 10.2(a) Short-term prediction: autoregressive, 
r = 0.7411, 0% = 1, Cx(1) = 0.7411. 


p MSE Coefficients 
0.45077 0.74110 


2 0.45077 0.74110 0 


TABLE 10.2(b) Short-term prediction: long-range dependent process, 
Hurst = 0.9, 0% = 1, Cy(1) = 0.7411. 


MSE Coefficients 


0.45077 0.74110 

0.43625 0.60809 0.17948 

0.42712 0.582127 0.091520 0.144649 

0.42253 0.567138 0.082037 0.084329 0.103620 

0.41964 0.558567 0.075061 0.077543 0.056707 0.082719 


nk WN PID 


achieves most of the possible performance with a p = 1 system, but small reductions in mean 
square error do accrue by adding more coefficients. This is due to the persistent correlation 
among the values in X(t). 

Table 10.3 shows the dramatic impact of long-range dependence on prediction perfor- 
mance. We modified Eq. (10.86) to provide the optimum linear predictor for X, based on two ob- 
servations X;_19 and X,- that are in the relatively remote past. X,(t) and its previous values are 
almost uncorrelated, so the best predictor has a mean square error of almost 1, which is the vari- 
ance of X, (t). On the other hand, X,(f) retains significant correlation with its previous values and 
so the mean square error provides a significant reduction from the unit variance. Note that the 
second-order predictor places significant weight on the observation 20 samples in the past. 


TABLE 10.3(a) Long-term prediction: autoregressive, 
r = 0.7411, 0% = 1, Cx(1) = 0.7411. 


p MSE Coefficients 
1 0.99750 0.04977 
0.99750 0.04977 0 


TABLE 10.3(b) Long-term prediction: long-range dependent 
process, Hurst = 0.9, 0% = 1, Cx(1) = 0.7411. 


p MSE Coefficients 


10 0.79354 0.45438 
10;20 0.74850 0.34614 0.23822 
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10.4.3 Estimation Using the Entire Realization of the Observed Process 


Suppose that Z, is to be estimated by a linear function Y, of the entire realization of X,, 
that is,a = b = œ and Eq. (10.73) becomes 


CO 
Y, = > hgX1-p- 
B=—00 
In the case of continuous-time random processes, we have 


yay = [max A) ap. 


The optimum filters must satisfy Eqs. (10.78) and (10.79), which in this case become 


B=% 
Rz x(t) = L h(B)Rx(t — B) dB forall r. (10.88b) 


The Fourier transform of the first equation and the Fourier transform of the second 
equation both yield the same expression: 


Szx(f) = H(f)Sx(f), 


which is readily solved for the transfer function of the optimum filter: 


_ Szx(f) 
~ Sx(f) ` 


The impulse response of the optimum filter is then obtained by taking the appropriate 
inverse transform. In general the filter obtained from Eq. (10.89) will be noncausal, 
that is, its impulse response is nonzero for t < 0. We already indicated that there are 
applications where this makes sense, namely, in situations where the entire realiza- 
tion X, is recorded and the estimate Z, is obtained in “nonreal time” by “playing 
back” Xa. 


(10.89) 


Example 10.24 Infinite Smoothing 


Find the transfer function for the optimum filter for estimating Z(t) from X (a) = Z(a) + N(a), 
ae(—0o, ©), where Z(a) and N(a) are independent, zero-mean random processes. 
The cross-correlation between the observation and the desired signal is 


Rz,x(t) = E[Z(t + 7) X(t)] = EL Z(t + 7r)(Z(t) + N(t))] 
= E[/Z(t + r)Z(t)] + E[Z(t + 7) N(t)] 
= Rz(7), 


since Z(t) and N(t) are zero-mean, independent random processes. The cross-power spectral 
density is then 


Sz x(t) = Sz(f). (10.90) 
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The autocorrelation of the observation process is 
Ry(7t) = E[(Z(t + 7) + N(t + 7))(Z(t) + N(t))] 
= Rz(7) + Ry(7). 


The corresponding power spectral density is 


Sx(f) = Sz(f) + Sy(f). (10.91) 
Substituting Eqs. (10.90) and (10.91) into Eq. (10.89) gives 
H(f) = AN (10.92) 


Sz(f) + Swf) 


Note that the optimum filter H(f) is nonzero only at the frequencies where S7(f) is nonzero, 
that is, where the signal has power content. By dividing the numerator and denominator of Eq. 
(10.92) by Sz(f), we see that H( f) emphasizes the frequencies where the ratio of signal to noise 
power density is large. 


Estimation Using Causal Filters 


Now, suppose that Z, is to be estimated using only the past and present of X,,, that is, 
I = (—œ, t). Equations (10.78) and (10.79) become 


B=0 
Rz,x(T) = | h(B)Rx(t — B)dB forall r. (10.93b) 
0 


Equations (10.93a) and (10.93b) are called the Wiener-Hopf equations and, though sim- 
ilar in appearance to Eqs. (10.88a) and (10.88b), are considerably more difficult to solve. 

First, let us consider the special case where the observation process is white, that 
is, for the discrete-time case Ry(m) = ôm. Equation (10.93a) is then 


B=0 
Thus in this special case, the optimum causal filter has coefficients given by 
h = 0 m<0 
a Rz x(m) m= 0. 


The corresponding transfer function is 
H(f) = X Rz.x(mje Pr. (10.95) 
m=0 


Note Eq. (10.95) is not Sz x(f ), since the limits of the Fourier transform in Eq. (10.95) do 
not extend from —co to +00. However, H( f) can be obtained from Sz y(f) by finding 
hm = ¥ '[Sz.x(f)], keeping the causal part (i.e., hm for m = 0) and setting the non- 
causal part to 0. 
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We now show how the solution of the above special case can be used to solve the 
general case. It can be shown that under very general conditions, the power spectral 
density of a random process can be factored into the form 


Sx(f) = IGO)? = G)G* (f), (10.96) 


where G(f) and 1/G(f) are causal filters. This suggests that we can find the optimum 
filter in two steps, as shown in Fig. 10.15. First, we pass the observation process through 
a “whitening” filter with transfer function W(f) = 1/G(f) to produce a white noise 
process X7j,, since 


Syx(f) = IW(f)?Sx(f) = = 1 for all f. 


Second, we find the best estimator for Z, using the whitened observation process 
X, as given by Eq. (10.95). The filter that results from the tandem combination of 
the whitening filter and the estimation filter is the solution to the Wiener-Hopf 
equations. 

The transfer function of the second filter in Fig. 10.15 is 


H (f) = 5 Rz y(m)e Palm (10.97) 
m=0 
by Eq. (10.95). To evaluate Eq. (10.97) we need to find 
Rz x(k) = EL Zn+Xn] 


= X wE|Zn+kXn-i] 
= 


= X w;Rz, x(k + i), (10.98) 
i=0 


where w; is the impulse response of the whitening filter. The Fourier transform of 
Eq. (10.98) gives an expression that is easier to work with: 


; Szx(f) 
Sz x(f) = W(f)Sz.x(f) = er a (10.99) 
X K. m Yy 
WA) Hf) 
FIGURE 10.15 


Whitening filter approach for solving Wiener- 
Hopf equations. 


The method for factoring Sy(f) as specified by Eq. (10.96) is called spectral factorization. See Example 
10.10 and the references at the end of the chapter. 
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The inverse Fourier transform of Eq. (10.99) yields the desired Rz y:(k), which can 
then be substituted into Eq. (10.97) to obtain H>(f). 

In summary, the optimum filter is found using the following procedure: 

Factor Sx(f) as in Eq. (10.96) and obtain a causal whitening filter W(f) = 1/G(f). 
Find Rz x(k) from Eq. (10.98) or from Eq. (10.99). 

H,(f) is then given by Eq. (10.97). 

The optimum filter is then 


A(f) = W(f)ED(f). (10.100) 


This procedure is valid for the continuous-time version of the optimum causal filter problem, 
after appropriate changes are made from summations to integrals. The following example con- 
siders a continuous-time problem. 


Pee Nes 


Example 10.25 Wiener Filter 


Find the optimum causal filter for estimating a signal Z(t) from the observation X(t) = Z(t) + 
N(t), where Z(t) and M(t) are independent random processes, N(t) is zero-mean white noise 
density 1, and Z(t) has power spectral density 

2 


Sz(f) = LE hf? 


The optimum filter in this problem is called the Wiener filter. 
The cross-power spectral density between Z(t) and X(t) is 


Sz.x(f) = Sz(f), 


since the signal and noise are independent random processes. The power spectral density for the 
observation process is 


Sy(f) = Sz(f) + Sy(f) 


_ Saat? 
1+ 4° f? 
_ [jawf + V3\(-jaaf + V3 
jaf +1 jaf +1 | 
If we let 
_ jaf + V3 
A) af 1” 


then it is easy to verify that W(f) = 1/G(f) is the whitening causal filter. 
Next we evaluate Eq. (10.99): 


Sz.x(f) 2 1 — j2rf 
Szx(f) = GP) ane 4a f? V3 Spar 
2 
(1 + j2af)(V3 — j2mf) 
C Cc 


= ; H š 10.101 
1+ j2rf V3- jaf ( ) 
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where c = 2/(1 + V3). If we take the inverse Fourier transform of Sz y(f), we obtain 


ce” 7>O0 
Rz,x(7) z 1 T <0. 


Equation (10.97) states that H>(f) is given by the Fourier transform of the r > 0 portion of 
Rz x(T): 
c 


Hy(f) = F{ce*u(r)} = 14 pont’ 


Note that we could have gotten this result directly from Eq. (10.101) by noting that only the first 
term gives rise to the positive-time (i.e., causal) component. 
The optimum filter is then 


1 ja c 
V3 + j2rf 


THE KALMAN FILTER 


The optimum linear systems considered in the previous section have two limitations: 
(1) They assume wide-sense stationary signals; and (2) The number of equations grows 
with the size of the observation set. In this section, we consider an estimation approach 
that assumes signals have a certain structure. This assumption keeps the dimensionali- 
ty of the problem fixed even as the observation set grows. It also allows us to consider 
certain nonstationary signals. 

We will consider the class of signals that can be represented as shown in Fig. 10.16(a): 


Zn = ap 1Zn 1 +W, n=1,2,..., (10.102) 


where Zp is the random variable at time 0, a, is a known sequence of constants, and W, is 
a sequence of zero-mean uncorrelated random variables with possibly time-varying vari- 
ances { E[W}]}. The resulting process Z,, is nonstationary in general. We assume that the 
process Z,, is not available to us, and that instead, as shown in Fig. 10.16(a), we observe 


XH +N, n=0,1,2,..., (10.103) 


where the observation noise N, is a zero-mean, uncorrelated sequence of random vari- 
ables with possibly time-varying variances {E[N}]}. We assume that W, and N, are 
uncorrelated at all times nı and nz. In the special case where W,, and N, are Gaussian 
random processes, then Z,, and X, will also be Gaussian random processes. We will de- 
velop the Kalman filter, which has the structure in Fig. 10.16(b). 

Our objective is to find for each time n the minimum mean square estimate (ac- 
tually prediction) of Z,, based on the observations Xo, X1, ..., X„-1 using a linear esti- 
mator that possibly varies with time: 


n 
Yp = XAI Xp-j. (10.104) 
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n-1 
Unit 
delay 
Nn 
Wh-1 -G >Z,, O > X, 
(a) 
Y, 
Unit 
an —(«&) delay 
i. Y 
Xn (3 -H = Y, +1 
kn 
(b) 
FIGURE 10.16 


(a) Signal structure. (b) Kalman filter. 


The orthogonality principle implies that the optimum filter {400} satisfies 


n 
(z, z SH ka |X =0 for! =0,1,...,n—-1, 
j=l 


which leads to a set of n equations in n unknowns: 


n 
Rz x(n) = SAV Ry(n- j,l) for! =0,1,...,2-1. (10.105) 
j=l 
At the next time instant, we need to find 
n+1 


Y, = Uk Xn+1- -j (10.106) 


by solving a system of (n + 1) X (n + 1) equations: 


n+1 


Re x(n +10) = SAMR (n+ 1-5,2) for! =0,1,...,2. (10.107) 
j=l 


Up to this point we have followed the procedure of the previous section and we 
find that the dimensionality of the problem grows with the number of observa- 
tions. We now use the signal structure to develop a recursive method for solving 
Eq. (10.106). 
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We first need the following two results: For l < n, we have 
Rz x(n + 1, l) = 
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E| Zn+1X1] = El (anZy + Wr) Xi] 


= a„Rz x(n, l) + E| W,„Xı] = a,Rz x(n l) (10.108) 
since E[W,,X)] = E[W,]E[X;] = 0, that is, W,, is uncorrelated with the past of the 
process and the observations prior to time n, as can be seen from Fig. 10.16(a). Also for 
l < n, we have 


Rz x(n, l1) = E[Z,Xı] = E[(X, — Nn) Xi] 


= Rx(n, L) ~ E|N,X1] = Rx(n, l), (10.109) 
since E[ N,Xı] = E[N, ]E[X,] = 0, that is, the observation noise at time n is uncorre- 
lated with prior observations 


We now show that the set of equations in Eq. (10.107) can be related to the set in 
Eq. (10.105). For / < n, we can equate the right-hand sides of Eqs. (10.108) and (10.107) 


n+1 
a„Rz x(n, L) = SAMR x(n i as j, L) 
j=1 


n+1 


= h Ry(n,1) + Zh” R(n +1-j, i) 
= 


for l = 0,1,...,2 — 1. (10.110) 
From Eq. (10.109) we have Rx(n, /) 


Rz x(n, 1), so we can replace the first term on 
the right-hand of Eq. (10.110) and then move the resulting term to the left-hand side 


aid 
(a, — h™)Rz x(n, 1) = SAPRA + 1- jt 


(10.111) 
j=l 
By dividing both sides by a,, — A”) we finally obtain 
Rz x(n,l) = > is 


forl = 0,1, 


.„ n — 1. (10.112) 
This set of equations is identical to Eq. (10.105) if we set 


(10.113a) 
Therefore, if at step n we have found hì) ~ 


k h” } , and if somehow we have found 
hì `, then we can find the remaining coefficients from 
AW, = (a, — Al” a D j=l, n (10.113b) 
Thus the key question is how to find ns 
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Suppose we substitute the coefficients in Eq. (10.113b) into Eq. (10.106): 


a hP X, + (an T nl” )Y, 
= a,Y, + hP (X, — Y,), (10.114) 


where the second equality follows from Eq. (10.104). The above equation has a very 
pleasing interpretation, as shown in Fig. 10.16(b). Since Y, is the prediction for time 
n,a,Y,, is the prediction for the next time instant, n + 1, based on the “old” informa- 
tion (see Eq. (10.102)). The term (X„ — Y,,) is called the “innovations,” and it gives the 
discrepancy between the old prediction and the observation. Finally, the term h\”) is 
called the gain, henceforth denoted by k,,, and it indicates the extent to which the in- 
novations should be used to correct a„Y, to obtain the “new” prediction Y„+1. If we de- 
note the innovations by 


In = X,-Y, (10.115) 


then Eq. (10.114) becomes 
Yue. = 4nYn + Kyl. (10.116) 


We still need to determine a means for computing the gain k,,. 
From Eq. (10.115), we have that the innovations satisfy 


Tn = Xn = Yn = Zn + Ny — Yn = Zn — Yn + Nn = En + Na, 


where £, = Z, — Y, is the prediction error. A recursive equation can be obtained for 
the prediction error: 


En+1 — Zn+1 Yn41 = a,Zy + Wa D anYn g knl, 
> an(Zn ~ Y,) T Wa = ki (En + Na) 
= (a, — k, )e, + W, — k,Np> (10.117) 
with initial condition sọ = Zp. Since Xo, W,,, and N, are zero-mean, it then follows that 


Ele,] = 0 for all n. A recursive equation for the mean square prediction error is ob- 
tained from Eq. (10.117): 


Elezai] = (an — kn) E[e?] + E[W2] + ELN2), (10.118) 


with initial condition E[e}] = E[Z3]. We are finally ready to obtain an expression for 
the gain k,,. 

The gain k,, must minimize the mean square error E[ ¢%, ; ]. Therefore we can dif- 
ferentiate Eq. (10.118) with respect to k„ and set it equal to zero: 


0 = -2(a, — k,)E[e%] + 2k,E[N?]. 
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Then we can solve for k,,: 
a, El En] 
Elen] + ELN] 


Kn = (10.119) 


The expression for the mean square prediction error in Eq. (10.118) can be sim- 
plified by using Eq. (10.119) (see Problem 10.72): 


El ets] = anlan — ky) Ele] + E[W2]. (10.120) 
Equations (10.119), (10.116), and (10.120) when combined yield the recursive 
procedure that constitutes the Kalman filtering algorithm: 


Kalman filter algorithm:’ 
Initialization: Yy = 0  Eļ|e$] = E[Z3] 
Forn = 0,1,2,... 
a,E[ en] 


kn = Fret] DN 


Yn+1 = anYn + kj (Xn = Y,„) 
Elen+1] = An(An a k,)E[en] + E[Wi]. 


Note that the algorithm requires knowledge of the signal structure, i.e., the a„, and the 
variances E[ N?] and E[W2]. The algorithm can be implemented easily and has conse- 
quently found application in a broad range of detection, estimation, and signal pro- 
cessing problems. The algorithm can be extended in matrix form to accommodate a 
broader range of processes. 


Example 10.26 First-Order Autoregressive Process 
Consider a signal defined by 
Z, =aZ, 1 +W, n=1,2,... =O, 


where E[W?] = of = 0.36, and a = 0.8, and suppose the observations are made in additive 
white noise 


X,=Z,+N, n=0,1,2,..., 


where E[N?2] = 1. Find the form of the predictor and its mean square error as n > 0%. 
The gain at step n is given by 


The mean square error sequence is therefore given by 
E[eo] = E[Z5] = 0 


°We caution the student that there are two common ways of defining the gain. The statement of the Kalman 
filter algorithm will differ accordingly in various textbooks. 
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Elensi] = ala — ky)E[en] + ow 


a 
= Efe] + oå f SA Dirin 
d T T) [e4] + ow orn 


The steady state mean square error eœ must satisfy 


2 


eœ% e% + oy. 


1 + x 


For a = 0.8 and o% = 0.36, the resulting quadratic equation yields kx. = 0.3 and ex = 0.6. 
Thus at steady state the predictor is 


Y, 


n 


41 = 0.8Y, + 0.3(X, — Y,). 


ESTIMATING THE POWER SPECTRAL DENSITY 


Let X0,..., Xg-1 be k observations of the discrete-time, zero-mean, wide-sense sta- 
tionary process X,,. The periodogram estimate for S,(f) is defined as 
5 ln 
PP) = FIAI, (10.121) 
where X;,(f) is obtained as a Fourier transform of the observation sequence: 
k-1 
Ef) = X Xm Pr. (10.122) 
m=0 
In Section 10.1 we showed that the expected value of the periodogram estimate is 
7 k=] lm’ | g f 
Ep) = Se eRe (10.123) 
m'=—(k-1) k 


so (f) is a biased estimator for Sy(f). However, as k > œ, 


E(Pk(f)]— Sx(f), (10.124) 


so the mean of the periodogram estimate approaches Sy(f). 

Before proceeding to find the variance of the periodogram estimate, we note that 
the periodogram estimate is equivalent to taking the Fourier transform of an estimate 
for the autocorrelation sequence; that is, 


k-1 
p= > Amer, (10.125) 
m=—(k-1) 


where the estimate for the autocorrelation is 


qk oel-1 


an= > a. (10.126) 


(See Problem 10.77.) 
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FIGURE 10.17 
Periodogram for 64 samples of white noise sequence X, iid uniform in (0, 1), Sy(f) = o% = 
1/12 = 0.083. 


We might expect that as we increase the number of samples k, the periodogram es- 
timate converges to Sy(f). This does not happen. Instead we find that p,(f) fluctuates 
wildly about the true spectral density, and that this random variation does not decrease 
with increased k (see Fig. 10.17). To see why this happens, in the next section we compute 
the statistics of the periodogram estimate for a white noise Gaussian random process. We 
find that the estimates given by the periodogram have a variance that does not approach 
zero as the number of samples is increased. This explains the lack of improvement in the 
estimate as k is increased. Furthermore, we show that the periodogram estimates are un- 
correlated at uniformly spaced frequencies in the interval —1/2 = f < 1/2. This explains 
the erratic appearance of the periodogram estimate as a function of f. In the final section, 
we obtain another estimate for Sy(f) whose variance does approach zero as k increases. 


Variance of Periodogram Estimate 


Following the approach of [Jenkins and Watts, pp. 230-233], we consider the peri- 
odogram of samples of a white noise process with S;(f) = o% at the frequencies 
f =n/k, -k/2 =< n < k/2, which will cover the frequency range —1/2 = f < 1/2. 
(In practice these are the frequencies we would evaluate if we were using the FFT al- 
gorithm to compute %;(f).) First we rewrite Eq. (10.122) at f = n/k as follows: 


~n~ [n kal 2amn _ . [2mmn 
a(i) = Zale k ) ! sa( k )) 


Aln) = jB(n) —k/2 sn< k/2, 


(10.127) 
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where 
k-1 
2 
A(n) = S Xn cos( zmn) (10.128) 
m=0 
and 
= 2amn 
Bn) = > Xn sin( ) (10.129) 
m=0 
Then it follows that the periodogram estimate is 
~[n 1l /n\? 1 
a(z) k (2) = z {Ain) + Be(n)}. (10.130) 


We find the variance of p;(n/k) from the statistics of A,(m) and B,(n). 
The random variables A,(m) and B,(n) are defined as linear functions of the 


jointly Gaussian random variables Xo,..., X,_,. Therefore A(n) and B,(n) are also 
jointly Gaussian random variables. If we take the expected value of Eqs. (10.128) and 
(10.129) we find 

E| A;(n)] = 0 = E[B,(n)] for all n. (10.131) 


Note also that the n = —k/2 and n = 0 terms are different in that 


B,(—k/2) = 0 = B,(0) (10.132a) 
A,(—-k/2) = Sni, A;(0) = Sx, (10.132b) 
i=0 i=0 


The correlation between A(n) and A;(m) (for n, m not equal to —k/2 or 0) is 


PLA(a)Ad(m)] = SS ELX, cos( 72") cos( 222") 


i=0 /=0 


k- F š 
2mni 2ami 
2 
ox cos( ) cos( ) 
ee k 


k- -ji k-1 f 
1 (= a 1 (= + u 
2 2 
= oy > cos + oY > cos 2 
he k 2 k 


i=0 


where we used the fact that E[ X;X,] = 046; since the noise is white. The second sum- 
mation is equal to zero, and the first summation is zero except when n = m. Thus 


1 
E| A,(n)A;(m)] = ykox ôm  foralln,m # —k/2,0.  (10.133a) 
It can similarly be shown that 
1 
E| B,(n)B,(m)] = x kox bum n,m #0 — k/2,0 (10.133b) 


E[A;,(n)B,(m)] = 0 for all n, m. (10.133c) 
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When n = —k/2 or 0, we have 
E[A,(n)A,(m)] = ko% ôm forall m. (10.133d) 


Equations (10.133a) through (10.133d) imply that A(n) and B,(m) are uncorrelated 
random variables. Since A(n) and B;(n) are jointly Gaussian random variables, this 
implies that they are zero-mean, independent Gaussian random variables. 

We are now ready to find the statistics of the periodogram estimates at the fre- 
quencies f = n/k. Equation (10.130) gives 


ad) = Zt Aun) + Bi(n)} n= ~k/2,0 


1 d Ailn) Bin) ) 
2°*) (1/2)ke%  (1/2)ko3-J 


(10.134) 


The quantity in brackets is the sum of the squares of two zero-mean, unit-variance, in- 
dependent Gaussian random variables. This is a chi-square random variable with two 
degrees of freedom (see Problem 7.6). From Table 4.1, we see that a chi-square random 
variable with v degrees of freedom has variance 2v. Thus the expression in the brackets 
has variance 4, and the periodogram estimate p,(n/k) has variance 


var) (2) | = (Fox) 4 = of = Sx(fY. (10.135a) 


ag) = oH 


The quantity in brackets is a chi-square random variable with one degree of freedom 
and variance 2, so the variance of the periodogram estimate is 


For n = —k/2 andn = 0, 


varja (2) =20%4 n= —k/2,0. (10.135b) 


Thus we conclude from Eqs. (10.135a) and (10.135b) that the variance of the peri- 
odogram estimate is proportional to the square of the power spectral density and does not 
approach zero as k increases. In addition, Eqs. (10.133a) through (10.133d) imply that the 


periodogram estimates at the frequencies f = —n/k are uncorrelated random variables.A 
more detailed analysis [Jenkins and Watts, p. 238] shows that for arbitrary f, 
VAR[p Sx(f} 41 + pe ) 10.136 


Thus variance of the periodogram estimate does not approach zero as the number of 
samples is increased. 

The above discussion has only considered the spectrum estimation for a white 
noise, Gaussian random process, but the general conclusions are also valid for non- 
white, non-Gaussian processes. If the X; are not Gaussian, we note from Eqs. (10.128) 
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and (10.129) that A, and B, are approximately Gaussian by the central limit theorem 
if k is large. Thus the periodogram estimate is then approximately a chi-square random 
variable. 

If the process X; is not white, then it can be viewed as filtered white noise: 


Xn = h,* Wn, 
where Sw(f) = ow and |H(f)|? Sw(f) = Sx(f). The periodograms of X„ and W, are 


related by 
1l o /an\? 1 n\n (2) 2 
k x,(2) + (2) Wel (10.137) 
Thus 
2 |¥,(n/k)|? 
a2) eee i (10.138) 
k |H(n/k)| 


From our previous results, we know that |@,(n/k)|?/k is a chi-square random variable 
with variance oy. This implies that 


var| SWL] 7 (2) 


Thus we conclude that the variance of the periodogram estimate for nonwhite noise is 
also proportional to Sy(f)?. 


ot = Sx(f)*. (10.139) 


Smoothing of Periodogram Estimate 


A fundamental result in probability theory is that the sample mean of a sequence of 
independent realizations of a random variable approaches the true mean with proba- 
bility one. We obtain an estimate for Sy(f) that goes to zero with the number of obser- 
vations k by taking the average of N independent periodograms on samples of size k: 


N 
Aw = EEP) (10.140) 
i=l 


where {p,;(f)} are N independent periodograms computed using separate sets of k 
samples each. Figures 10.18 and 10.19 show the N = 10 and N = 50 smoothed peri- 
odograms corresponding to the unsmoothed periodogram of Fig. 10.17. It is evident 
that the variance of the power spectrum estimates is decreasing with N. 

The mean of the smoothed estimator is 


N 
E(Af))w = NÈ EP] = EUR) 
k-1 m' l l 
T >it - i ketm jer, (10.141) 


where we have used Eq. (10.35). Thus the smoothed estimator has the same mean as 
the periodogram estimate on a sample of size k. 
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FIGURE 10.18 


Sixty-four-point smoothed periodogram with N = 10, X, iid uniform in (0, 1), 
Sy(f) = 1/12 = 0.083. 
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FIGURE 10.19 


Sixty-four-point smoothed periodogram with N = 50, X, iid uniform in (0, 1), 
Sy(f) = 1/12 = 0.083. 


628 


10.7 


10.7.1 


Chapter 10 Analysis and Processing of Random Signals 


The variance of the smoothed estimator is 


1 N 
DVAR[PxA(F)] 


VARL(P(F))¥] = ya 


—VAR(P(f)] 


Thus the variance of the smoothed estimator can be reduced by increasing N, the num- 
ber of periodograms used in Eq. (10.140). 

In practice, a sample set of size Nk, Xo,..., Xyng-1 is divided into N blocks and a 
separate periodogram is computed for each block. The smoothed estimate is then the 
average over the N periodograms. This method is called Bartlett’s smoothing procedure. 
Note that, in general, the resulting periodograms are not independent because the un- 
derlying blocks are not independent. Thus this smoothing procedure must be viewed as 
an approximation to the computation and averaging of independent periodograms. 

The choice of k and N is determined by the desired frequency resolution and 
variance of the estimate. The blocksize k determines the number of frequencies for 
which the spectral density is computed (i.e., the frequency resolution). The variance of 
the estimate is controlled by the number of periodograms N. The actual choice of k and 
N depends on the nature of the signal being investigated. 


NUMERICAL TECHNIQUES FOR PROCESSING RANDOM SIGNALS 


In this chapter our discussion has combined notions from random processes with basic 
concepts from signal processing. The processing of signals is a very important area in 
modern technology and a rich set of techniques and methodologies have been devel- 
oped to address the needs of specific application areas such as communication systems, 
speech compression, speech recognition, video compression, face recognition, network 
and service traffic engineering, etc. In this section we briefly present a number of gen- 
eral tools available for the processing of random signals. We focus on the tools provid- 
ed in Octave since these are quite useful as well as readily available. 


FFT Techniques 


The Fourier transform relationship between Ry(7) and Sy(f) is fundamental in the 
study of wide-sense stationary processes and plays a key role in random signal analysis. 
The fast fourier transform (FFT) methods we developed in Section 7.6 can be applied 
to the numerical transformation from autocorrelation functions to power spectral den- 
sities and back. 

Consider the computation of Ry(r) and Sy(f) for continuous-time processes: 


oo W 
Retr) = f SP af a f See af. 


Section 10.7 Numerical Techniques for Processing Random Signals 629 


First we limit the integral to the region where S(f) has significant power. Next we re- 
strict our attention to a discrete set of N = 2M frequency values at kfy so that 
-W = -Mfo < (-M + 1)fo <--: < (M — 1)fo < W, and then approximate the 
integral by a sum: 


M-1 
Rx(t) = ©, Sx(mfye Pr"! fo. 
m=—-M 


Finally, we also focus on a set of discrete lag values: kto so that -T = —Mt) < (-M + 1} 
to <: < (M — 1)f) < T. We obtain the DFT as follows: 
M-1 M-1 
Rx(kto) = fo X, Sx(mfyyeP7™ = fo X, Sx(mfo)eP7™™. (10.142) 
m=—M m=—M 


In order to have a discrete Fourier transform, we must have tofo = 1/N, which is equiv- 
alent to: to = 1/Nfo andT = Mto = 1/2fo and W = Mfo = 1/2tọ. We can use the FFT 
function introduced in Section 7.6 to perform the transformation in Eq. (10.142) to ob- 
tain the set of values {Ry(kto),ke[—M, M — 1]} from {Sx(mto),ke[-M,M — 1]}. 
The transformation in the reverse direction is done in the same way. Since Ry(7) and 
Sx(f) are even functions various simplifications are possible. We discuss some of these 
in the problems. 

Consider the computation of Sy(f) and Ry(k) for discrete-time processes. Sy(f) 
spans the range of frequencies |f| < 1/2, so we restrict attention to N points 1/N apart: 


s,(2) = > Ry(k)e P7 


The approximation here involves neglecting autocorrelation terms outside [—M, M — 1}. 
Since df ~ 1/N, the transformation in the reverse direction is scaled differently: 


M-1 
~ X Ry(kjeP7™N, (10.143) 
f=mN  k=-M 


1/2 _ 
_ -j2akf yr n 1 M \ | —j2mkm/N 
Ry(k) Sy(fye df~— $, Sy{— Je . (10.144) 
a N ym ZAN 


We assume that the student has already tried the FFT exercises in Section 7.6, so we 
leave examples in the use of the FFT to the Problems. 

The various frequency domain results for linear systems that relate input, output, 
and cross-spectral densities can be evaluated numerically using the FFT. 


Example 10.27 Output Autocorrelation and Cross-Correlation 


Consider Example 10.12, where a random telegraph signal X(t) with a = 1 is passed through a 
lowpass filter with 8 = 1 and £ = 10. Find Ry(r). 
The random telegraph has Sy(f) = a/(a” + wf?) and the filter has transfer function 


H(f) = B/(B + j2mf), so Ry(r) is given by: 
B a 


df. 
B + 4a? f? a HAr? f 
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FIGURE 10.20 
(a) Transfer function and input power spectral density; (b) Autocorrelation of filtered random telegraph with filter 8 = 10. 


10.7.2 


We used an N = 256 FFT to evaluate autocorrelation functions numerically for a = 1 and 
B = land 8 = 10. Figure 10.20(a) shows |H(f)|* and Sy(f) for B = 10. It can be seen that the 
transfer function (the dashed line) is close to 1 in the region of f where Sy(f) has most of its 
power. Consequently we expect the output for 8 = 10 to have an autocorrelation similar to that 
of the input. For 8 = 1, on the other hand, the filter will attenuate more of the significant fre- 
quencies of X(t) and we expect more change in the output autocorrelation. Figure 10.20(b) 
shows the output autocorrelation and we see that indeed for 8 = 10 (the solid line), Ry(7) is 
close to the double-sided exponential of Rx(r). For 6 = 1 the output autocorrelation differs 
significantly from Ry(7). 


Filtering Techniques 


The autocorrelation and power spectral density functions provide us with information 
about the average behavior of the processes. We are also interested in obtaining sam- 
ple functions of the inputs and outputs of systems. For linear systems the principal tools 
for signal processing are the convolution and Fourier transform. 

Convolution in discrete-time (Eq. (10.48)) is quite simple and so convolution is 
the workhorse in linear signal processing. Octave provides several functions for per- 
forming convolutions with discrete-time signals. In Example 10.15 we encountered the 
function filter (b,a,x) which implements filtering of the sequence x with an ARMA 
filter with coefficients specified by vectors b and a in the following equation. 


q p 
Y, = -X a;Y,-i + S B/Xn-j- 
E =) 


Other functions use filter(b,a,x) to provide special cases of filtering. For example, 
conv (a,b) convolves the elements in the vectors a and b. We can obtain the output of a 
linear system by letting a be the impulse response and b the input random sequence. 
The moving average example in Fig. 10.7(b) is easily obtained using this conv. Octave 
provides other functions implementing specific digital filters. 


10.7.3 
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We can also obtain the output of a linear system in the frequency domain. We take 
the FFT of the input sequence X,, and we then multiply it by the FFT of the transfer 
function. The inverse FFT will then provide Y, of the linear system. The Octave function 
fftconv(a,b,n) implements this approach. The size of the FFT must be equal to the 
total number of samples in the input sequence, so this approach is not advisable for long 
input sequences. 


Generation of Random Processes 


Finally, we are interested in obtaining discrete-time and continuous-time sample func- 
tions of the inputs and outputs of systems. Previous chapters provide us with several tools 
for the generation of random signals that can act as inputs to the systems of interest. 

Section 5.10 provides the method for generating independent pairs of Gaussian 
random variables. This method forms the basis for the generation of iid Gaussian se- 
quences and is implemented in normal_rnd=(M,V,Sz). The generation of sequences of 
WSS but correlated sequences of Gaussian random variables requires more work. One 
approach is to use the matrix approaches developed in Section 6.6 to generate individ- 
ual vectors with a specified covariance matrix. To generate a vector Y of n outcomes 
with covariance Ky, we perform the following factorization: 


Ky = A'APAPT, 
and we generate the vector 
Y= A'X 


where X is vector of iid zero-mean, unit-variance Gaussian random variables. The Oc- 
tave function svd(B) performs a singular value decomposition of the matrix B, see 
[Long]. When B = Ky is a covariance matrix, svd returns the diagonal matrix D of 
eigenvalues of Ky as well as the matrices U = P and V = P". 


Example 10.28 Generation of Correlated Gaussian Random Variables 


Generate 256 samples of the autoregressive process in Example 10.14 with a = —0.5,0, = 1. 

The autocorrelation of the process is given by Ry(k) = (—1/2). We generate a vector r 
of the first 256 lags of Ry(k) and use the function toeplitz (r) to generate the covariance ma- 
trix. We then call the svd to obtain A. Finally we produce the output vector Y = A" X. 


> n=[0:255] 

E oa a De S eae i 

> K=toeplitz(r); 

> [U,D,V]=svd(K); 

> X=normal_rnd(0,1,1,256) ; 
> y=V* (D*0.5) *transpose (X); 
> plot (y) 


Figure 10.21(a) shows a plot of Y. To check that the sequence has the desired autocovari- 
ance we use the function autocov(X,H) which estimates the autocovariance function of the se- 
quence X for the first H lag values. Figure 10.21(b) shows that the sample correlation coefficient 
that is obtained by dividing the autocovariance by the sample variance. The plot shows the alter- 
nating covariance values and the expected peak values of —0.5 and 0.25 to the first two lags. 
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(a) Correlated Gaussian noise (b) Sample autocovariance. 


An alternative approach to generating a correlated sequence of random variables with 
a specified covariance function is to input an uncorrelated sequence into a linear filter with a 
specific H(f). Equation (10.46) allows us to determine the power spectral density of the out- 
put sequence. This approach can be implemented using convolution and is applicable to ex- 
tremely long signal sequences. A large choice of possible filter functions is available for both 
continuous-time and discrete-time systems. For example, the ARMA model in Example 10.15 
is capable of implementing a broad range of transfer functions. Indeed the entire discussion 
in Section 10.4 was focused on obtaining the transfer function of optimal linear systems in 
various scenarios. 


Example 10.29 Generation of White Gaussian Noise 


Find a method for generating white Gaussian noise for a simulation of a continuous-time com- 
munications system. 

The generation of discrete-time white Gaussian noise is trivial and involves the generation 
of a sequence of iid Gaussian random variables. The generation of continuous-time white Gauss- 
ian noise is not so simple. Recall from Example 10.3 that true white noise has infinite bandwidth 
and hence infinite power and so is impossible to realize. Real systems however are bandlimited, 
and hence we always end up dealing with bandlimited white noise. If the system of interest is 
bandlimited to W Hertz, then we need to model white noise limited to W Hz. In Example 10.3 we 


found this type of noise has autocorrelation: 
No sin(27W7) 
E 2TT ` 


Rx(7) 


The sampling theorem discussed in Section 10.3 allows us to represent bandlimited white Gauss- 
ian noise as follows: 
sin(zt/T) 


X(t) = DX (nT) v(t —nT) where p(t) = E a 
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where 1/T = 2W. The coefficients X(nT) have autocorrelation Ry(nT) which is given by: 
No sin(2a7WnT) = Nosin(27Wn/2W) 
E QanT E 2rn/2W 


Rx(nT) 


mn 0 for n +0. 


NW sin(mn) _ P for n=0 


We thus conclude that X(T) is an iid sequence of Gaussian random variables with variance 
NW. Therefore we can simulate sampled bandlimited white Gaussian noise by generating a se- 
quence X(nT). We can perform any processing required in the discrete-time domain, and we can 
then apply the result to an interpolator to recover the continuous-time output. 


SUMMARY 


e The power spectral density of a WSS process is the Fourier transform of its auto- 
correlation function. The power spectral density of a real-valued random process 
is a real-valued, nonnegative, even function of frequency. 


e The output of a linear, time-invariant system is a WSS random process if its input 
is a WSS random process that is applied an infinite time in the past. 


e The output of a linear, time-invariant system is a Gaussian WSS random process 
if its input is a Gaussian WSS random process. 


e Wide-sense stationary random processes with arbitrary rational power spectral 
density can be generated by filtering white noise. 


e The sampling theorem allows the representation of bandlimited continuous-time 
processes by the sequence of periodic samples of the process. 


e The orthogonality condition can be used to obtain equations for linear systems that 
minimize mean square error. These systems arise in filtering, smoothing, and predic- 
tion problems. Matrix numerical methods are used to find the optimum linear systems. 

e The Kalman filter can be used to estimate signals with a structure that keeps the di- 
mensionality of the algorithm fixed even as the size of the observation set increases. 

e The variance of the periodogram estimate for the power spectral density does not 
approach zero as the number of samples is increased. An average of several inde- 
pendent periodograms is required to obtain an estimate whose variance does ap- 
proach zero as the number of samples is increased. 

e The FFT, convolution, and matrix techniques are basic tools for analyzing, simu- 
lating, and implementing processing of random signals. 


CHECKLIST OF IMPORTANT TERMS 


Amplitude modulation Cross-power spectral density 
ARMA process Einstein-Wiener-Khinchin theorem 
Autoregressive process Filtering 

Bandpass signal Impulse response 


Causal system Innovations 
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Kalman filter 

Linear system 
Long-range dependence 
Moving average process 
Nyquist sampling rate 
Optimum filter 
Orthogonality condition 
Periodogram 

Power spectral density 


Sampling theorem 
Smoothed periodogram 
Smoothing 

System 

Time-invariant system 
Transfer function 
Unit-sample response 
White noise 

Wiener filter 


Prediction 


Quadrature amplitude modulation 


Wiener-Hopf equations 
Yule-Walker equations 


ANNOTATED REFERENCES 


References [1] through [6] contain good discussions of the notion of power spectral 
density and of the response of linear systems to random inputs. References [6] and [7] 
give accessible introductions to the spectral factorization problem. References [7] 
through [9] discuss linear filtering and power spectrum estimation in the context of 
digital signal processing. Reference [10] discusses the basic theory underlying power 
spectrum estimation. 
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PROBLEMS 
Section 10.1: Power Spectral Density 


10.1. Let g(x) denote the triangular function shown in Fig. P10.1. 
(a) Find the power spectral density corresponding to Ry(7) = g(7/T). 
(b) Find the autocorrelation corresponding to the power spectral density 


Sx(f) = s(f/W). 


> X 


=1 0 1 


FIGURE P10.1 


10.2. Let p(x) be the rectangular function shown in Fig. P10.2. Is R(T) = p(7/T) a valid au- 
tocorrelation function? 


-1 0 


= 


FIGURE P10.2 


10.3. (a) Find the power spectral density Sy(f) of a random process with autocorrelation 
function Ry(T) cos(27fyT), where Ry(r) is itself an autocorrelation function. 


(b) Plot Sy(f) if Ry(7) is as in Problem 10.1a. 


10.4. (a) Find the autocorrelation function corresponding to the power spectral density 
shown in Fig. P10.3. 


(b) Find the total average power. 
(c) Plot the power in the range |f| > fo as a function of fy > 0. 


| | >f 
hh oh 0 fi h l 


FIGURE P10.3 
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10.5. 


10.6. 
10.7. 


10.8. 


10.9. 


10.10. 


10.11. 


10.12. 


10.13. 


10.14. 
10.15. 


10.16. 


10.17. 


10.18. 


10.19. 


A random process X(t) has autocorrelation given by Ry(7) = oze h a>0. 

(a) Find the corresponding power spectral density. 

(b) Find the amount of power contained in the frequencies |f| > k/2aa, where 
k = 1,2,3. 

Let Z(t) = X(t) + Y(t). Under what conditions does S7(f) = Sx(f) + Sy(f)? 

Show that 

(a) Ryy(t) = Ry x(—7). 

(b) Sxy(f) s Sy.x(f). 

Let Y(t) = X(t) — X(t — d). 

(a) Find Ry y(r) and Sy y(f). 

(b) Find Ry(7) and Sy(f). 

Do Problem 10.8 if X(t) has the triangular autocorrelation function g(7/T ) in Problem 

10.1 and Fig. P 10.1. 

Let X(t) and Y(t) be independent wide-sense stationary random processes, and define 

Z(t) = X(t)Y(t). 

(a) Show that Z(t) is wide-sense stationary. 

(b) Find Rz(7) and S7(f). 

In Problem 10.10, let X(t) = acos(27fot + ©) where © is a uniform random variable in 

(0,27). Find Rz(7) and S7(f). 

Let Ry(k) = 4a", |a| < 1. 

(a) Find Sy(f). 

(b) Plot Sy(f) for a = 0.25 and a = 0.75, and comment on the effect of the value of a. 

Let Ry(k) = 4(a)" + 16(B)", a <1, B <1. 

(a) Find Sy(f). 

(b) Plot Sy(f) fora = B = 0.5 anda = 0.75 = 36 and comment on the effect of value 
of a/B. 

Let Rx(k) = 9(1 — |k|/N), for |k| < N and 0 elsewhere. Find and plot Sx(f). 


Let X, = cos(27fyn + ©), where © is a uniformly distributed random variable in the 
interval (0, 27). Find and plot Sy(f) for fo = 0.5, 1, 1.75, 7. 


Let D, = X, — Xy—a, where d is an integer constant and X, is a zero-mean, WSS ran- 
dom process. 

(a) Find Rp(k) and Sp(f) in terms of Ry(k) and Sy(f). What is the impact of d? 

(b) Find E[D?]. 

Find Rp(k) and Sp(f) in Problem 10.16 if X, is the moving average process of Example 
10.7 witha = 1. 


Let X,, be a zero-mean, bandlimited white noise random process with Sy(f) = 1 for 
|f| < f, and 0 elsewhere, where f, < 1/2. 

(a) Show that Ry(k) = sin(2rf,k)/(Tk). 

(b) Find Ry(k) when f, = 1/4. 

Let W, be a zero-mean white noise sequence, and let X,, be independent of W,,. 

(a) Show that Y, = W„X,„ is a white sequence, and find oy. 


(b) Suppose X,, is a Gaussian random process with autocorrelation Ry(k) = (1/ 2)K 


Specify the joint pmf’s for Y,. 


10.20. 


10.21. 


Problems 637 


Evaluate the periodogram estimate for the random process X(t) = acos(27fot + ©), 

where © is a uniformly distributed random variable in the interval (0, 277). What hap- 

pens as T > œ? 

(a) Show how to use the FFT to calculate the periodogram estimate in Eq. (10.32). 

(b) Generate four realizations of an iid zero-mean unit-variance Gaussian sequence of 
length 128. Calculate the periodogram. 

(c) Calculate 50 periodograms as in part b and show the average of the periodograms 
after every 10 additional realizations. 


Section 10.2: Response of Linear Systems to Random Signals 


10.22. 


10.23. 


10.24. 
10.25. 


10.26. 


10.27. 


10.28. 


10.29. 


Let X(t) be a differentiable WSS random process, and define 


Y(t) = Exin. 


Find an expression for Sy(f) and Ry(7). Hint: For this system, H(f) = j27f. 

Let Y(t) be the derivative of X(t), a bandlimited white noise process as in Example 10.3. 
(a) Find Sy(f) and Ry(r). 

(b) What is the average power of the output? 

Repeat Problem 10.23 if X(t) has Sy(f) = Brett. 

Let Y(t) be a short-term integration of X(t): 


Y(t) = Zp xe dt’. 


(a) Find the impulse response h(t) and the transfer function H(f). 

(b) Find Sy(f) in terms of Sy(f). 

In Problem 10.25, let Rx(7) = (1 — |r|/T) for |r| < T and zero elsewhere. 

(a) Find Sy(f). 

(b) Find Ry(7). 

(c) Find E[Y*(t)]. 

The input into a filter is zero-mean white noise with noise power density No/2. The filter 


has transfer function 
1 
ease jauf- 

(a) Find Sy x(f) and Ry_y(7). 

(b) Find Sy(f) and Ry(r). 

(c) What is the average power of the output? 

A bandlimited white noise process X(t) is input into a filter with transfer function 

A(f) =1 + j2rf. 

(a) Find Sy x(f) and Ry x(r) in terms of Ry(7) and Sy(f). 

(b) Find Sy(f) and Ry(7) in terms of Ry(7) and Sy(f). 

(c) What is the average power of the output? 

(a) A WSS process X(t) is applied to a linear system at t = 0. Find the mean and auto- 
correlation function of the output process. Show that the output process becomes 
WSS as t > œ. 
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10.30. 


10.31. 


10.32. 


10.33. 


10.34. 


10.35. 


Let Y(t) be the output of a linear system with impulse response A(t) and input X(t). Find 
Ry x(7) when the input is white noise. Explain how this result can be used to estimate the 
impulse response of a linear system. 


(a) A WSS Gaussian random process X(t) is applied to two linear systems as shown in 
Fig. P10.4. Find an expression for the joint pdf of Y (4) and W(t). 


(b) Evaluate part a if X(t) is white Gaussian noise. 


—| w» F- a 
m 


FIGURE P10.4 


X(t) 


Repeat Problem 10.31b if h(t) and h(t) are ideal bandpass filters as in Example 10.11. 
Show that Y(t) and W(t) are independent random processes if the filters have nonover- 
lapping bands. 

Let Y(t) = A(t) * X(t) and Z(t) = X(t) — Y(t) as shown in Fig. P10.5. 

(a) Find S7(f) in terms of Sy(f). 

(b) Find E[Z7(t)]. 


X(t) 


FIGURE P10.5 


Let Y(t) be the output of a linear system with impulse response A(t) and input X(t) + N(t). 
Let Z(t) = X(t) — Y(t). 

(a) Find Ry y(7) and Rz(7). 

(b) Find S7(f). 

(c) Find S7(f) if X(f) and N(t) are independent random processes. 


A random telegraph signal is passed through an ideal lowpass filter with cutoff frequency 
W. Find the power spectral density of the difference between the input and output of the 
filter. Find the average power of the difference signal. 


10.36. 


10.37. 


10.38. 


10.39. 


10.40. 


10.41. 


10.42. 


Problems 639 


Let Y(t) = acos(27f.t + ©) + N(t) be applied to an ideal bandpass filter that passes 
the frequencies |f—f,| < W/2. Assume that © is uniformly distributed in (0, 27). Find 
the ratio of signal power to noise power at the output of the filter. 

Let Y, = (Xn41 + Xn + Xn-1)/3 be a “smoothed” version of X,. Find Ry(k), Sy(f), 
and E[Y”,]. 

Suppose X, is a white Gaussian noise process in Problem 10.37. Find the joint pmf for 
(Yn; Ypsi > Yn+2). 

Let Y, = X, + BX,-1, where X, is a zero-mean, first-order autoregressive process with 
autocorrelation Ry(k) = 07a“, |a| < 1. 

(a) Find Ry x(k) and Sy x(f). 

(b) Find Sy(f), Ry(k), and E[Y2]. 

(c) For what value of 6 is Y,, a white noise process? 


A zero-mean white noise sequence is input into a cascade of two systems (see Fig. P10.6). 

System 1 has impulse response h, = (1/2)"u(n) and system 2 has impulse response 

2n = (1/4)"u(n) where u(n) = 1 for n = 0 and 0 elsewhere. 

(a) Find Sy(f) and Sz7(f). 

(b) Find Ry y(k) and Rw z(k); find Sy y(f) and Sw z(f). Hint: Use a partial fraction 
expansion of Sw z( f) prior to finding Rw z(k). 

(c) Find E[Z?]. 


FIGURE P10.6 


A moving average process X, is produced as follows: 


Xn =a Wa F a,W,,-1 ae &pWn-p; 


where W, is a zero-mean white noise process. 

(a) Show that Ry(k) = 0 for |k| > p. 

(b) Find Ry(k) by computing E[X,,,,X,,], then find Sy(f) = F{Rx(k)}. 

(c) Find the impulse response A, of the linear system that defines the moving average 


process. Find the corresponding transfer function H(f), and then Sy(f). Compare 
your answer to part b. 


Consider the second-order autoregressive process defined by 
3 1 
gts 
where the input W, is a zero-mean white noise process. 


(a) Verify that the unit-sample response is h,, = 2(1/2)” — (1/4)” for n = 0, and 0 oth- 
erwise. 


Y, 


Y,-2 Wan, 


(b) Find the transfer function. 
(c) Find Sy(f) and Ry(k) = FH Sy(f)}. 
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10.43. 


10.44. 


Suppose the autoregressive process defined in Problem 10.42 is the input to the following 
moving average system: 
Zn = Yn — WAY, -1- 


(a) Find S7(f) and Rz(k). 


(b) Explain why Z, is a first-order autoregressive process. 


(c) Find a moving average system that will produce a white noise sequence when Z,, is 
the input. 
An autoregressive process Y,, is produced as follows: 


Ya = aY,-1 Ferk QgYn—-q Te Was 


where W, is a zero-mean white noise process. 
(a) Show that the autocorrelation of Y, satisfies the following set of equations: 


Ry(0) = DaRi) + Rw(0) 


Ry(k) = S aRy(k g i). 


(b) Use these recursive equations to compute the autocorrelation of the process in 
Example 10.22. 


Section 10.3: Bandlimited Random Processes 


10.45. 


10.46. 


10.47. 


(a) Show that the signal x(t) is recovered in Figure 10.10(b) as long as the sampling rate 
is above the Nyquist rate. 

(b) Suppose that a deterministic signal is sampled at a rate below the Nyquist rate. 
Use Fig. 10.10(b) to show that the recovered signal contains additional signal com- 
ponents from the adjacent bands. The error introduced by these components is 
called aliasing. 

(c) Find an expression for the power spectral density of the sampled bandlimited ran- 
dom process X(t). 

(d) Find an expression for the power in the aliasing error components. 

(e) Evaluate the power in the error signal in part c if Sy(f) is as in Problem 10.1b. 

An ideal discrete-time lowpass filter has transfer function: 


Ji for Ifl < fe < 1/2 
mn- f for f< |f| < 1/2. 


(a) Show that H(f) has impulse response h, = sin(2a7f,n)/an. 


(b) Find the power spectral density of Y(kT) that results when the signal in Problem 
10.1b is sampled at the Nyquist rate and processed by the filter in part a. 


(c) Let Y(t) be the continuous-time signal that results when the output of the filter in 
part b is fed to an interpolator operating at the Nyquist rate. Find Sy(f). 

In order to design a differentiator for bandlimited processes, the filter in Fig. 10.10(c) is 

designed to have transfer function: 


H(f) = j2af/T for |f| < 1/2. 


10.48. 


10.49. 


10.50. 


10.51. 


10.52. 


10.53. 


10.54. 


10.55. 


Problems 641 


(a) Show that the corresponding impulse response is: 


mn cosmn — sinnn (-1)" 
ho = 0, h, = = 0 
k gii ane nT ý 


(b) Suppose that X(t) = a cos(2rfọøt + ©) is sampled at a rate 1/T = 4fọ and then 
input into the above digital filter. Find the output Y(t) of the interpolator. 

Complete the proof of the sampling theorem by showing that the mean square error is 

zero. Hint: First show that E[(X(t)—(X ((t) X(kT)] = 0, all k. 

Plot the power spectral density of the amplitude modulated signal Y(t) in Example 10.18, 

assuming f. > W; fe < W. Assume that A(t) is the signal in Problem 10.1b. 

Suppose that a random telegraph signal with transition rate « is the input signal in an am- 

plitude modulation system. Plot the power spectral density of the modulated signal as- 

suming f, = a/m and f, = 10a/7. 

Let the input to an amplitude modulation system be 2 cos(27f, + ®), where ® is uni- 

formly distributed in (—7, m). Find the power spectral density of the modulated signal 

assuming f, > fi. 

Find the signal-to-noise ratio in the recovered signal in Example 10.18 if Sy(f) = af? for 

|f + f.| < W and zero elsewhere. 

The input signals to a QAM system are independent random processes with power spec- 

tral densities shown in Fig. P10.7. Sketch the power spectral density of the QAM signal. 


] IN 
w 0 Ww w ow ` 


FIGURE P10.7 


Under what conditions does the receiver shown in Fig. P10.8 recover the input signals to 


a QAM signal? 
OS LPF 


2 cos (2af,.t + ©) 


(x) LPF 


2 sin (2xf t + ©) 


x(t) 


FIGURE P10.8 


Show that Eq. (10.67b) implies that Sz 4(f) is a purely imaginary, odd function of f. 


642 Chapter 10 Analysis and Processing of Random Signals 


Section 10.4: Optimum Linear Systems 


10.56. 


10.57. 


10.58. 


10.59. 


10.60. 


10.61. 


10.62. 


10.63. 


Let X, = Za + N, as in Example 10.22, where Z, is a first-order process with 
R;(k) = 4(3/4)' and N, is white noise with o% = 1. 

(a) Find the optimum p = 1 filter for estimating Z,. 

(b) Find the mean square error of the resulting filter. 

Let Xa = Za + N, as in Example 10.21, where Z, has Rz(k) = ozr) and N, has 
Ry(k) = orl, where r; and r are less than one in magnitude. 

(a) Find the equation for the optimum filter for estimating Zx. 

(b) Write the matrix equation for the filter coefficients. 

(c) Solve the p = 2 case, if 0? = 9,7, = 2/3, 0% = 1, and n = 1/3. 

(d) Find the mean square error for the optimum filter in part c. 

(e) Use the matrix function of Octave to solve parts c and d for p = 3, 4, 5. 


Let X, = Za + N, as in Example 10.21, where Z, is the first-order moving average 
process of Example 10.7, and N, is white noise. 


(a) Find the equation for the optimum filter for estimating Z,. 


(b) For the p = 1 and p = 2 cases, write and solve the matrix equation for the filter co- 
efficients. 


(c) Find the mean square error for the optimum filter in part b. 


Let X, = Z, + N, as in Example 10.19, and suppose that an estimator for Z, uses ob- 
servations from the following time instants: I = {n — p,...,n,...,n + p}. 

(a) Solve the p = 1 case if Z, and N, are as in Problem 10.56. 

(b) Find the mean square error in part a. 

(c) Find the equation for the optimum filter. 

(d) Write the matrix equation for the 2p + 1 filter coefficients. 

(e) Use the matrix function of Octave to solve parts a and b for p = 2,3. 

Consider the predictor in Eq. (10.86b). 

(a) Find the optimum predictor coefficients in the p = 2 case when Rz(k) = 9(1/ 3), 
(b) Find the mean square error in part a. 

(c) Use the matrix function of Octave to solve parts a and b for p = 3,4, 5. 

Let X(t) be a WSS, continuous-time process. 

(a) Use the orthogonality principle to find the best estimator for X(t) of the form 


X(t) = aX(t) + bX(h), 


where t and t are given time instants. 
(b) Find the mean square error of the optimum estimator. 
(c) Check your work by evaluating the answer in part b for t = t; and tf = t. Is the an- 
swer what you would expect? 
Find the optimum filter and its mean square error in Problem 10.61 if t; =t — d and 
b=t+d. 
Find the optimum filter and its mean square error in Problem 10.61 if 4 = t — dand h = t 
— 2d, and Rx(T) = eat | Compare the performance of this filter to the performance 
of the optimum filter of the form X(t) = aX(t — d). 


10.64. 


10.65. 


10.66. 


10.67. 


10.68. 


10.69. 


10.70. 


10.71. 


10.72. 


Problems 643 


Modify the system in Problem 10.33 to obtain a model for the estimation error in the op- 
timum infinite-smoothing filter in Example 10.24. Use the model to find an expression 
for the power spectral density of the error e(t) = Z(t) — Y(t), and then show that the 
mean square error is given by: 


© SO(f)Sw(f) 
He i EAA 


Hint: E[e*(t)] = R,(0). 
Solve the infinite-smoothing problem in Example 10.24 if Z(t) is the random telegraph 
signal with a = 1/2 and N(?) is white noise. What is the resulting mean square error? 


Solve the infinite-smoothing problem in Example 10.24 if Z(t) is bandlimited white noise 
of density N,/2 and N(f) is (infinite-bandwidth) white noise of noise density No/2. What 
is the resulting mean square error? 


Solve the infinite-smoothing problem in Example 10.24 if Z(t) and N(f) are as given in 
Example 10.25. Find the resulting mean square error. 


Let X,, = Z, + N,, where Z, and N, are independent, zero-mean random processes. 


(a) Find the smoothing filter given by Eq. (10.89) when Z, is a first-order autoregressive 
process with o% = 9 and a = 1/2 and N, is white noise with o} = 4. 


(b) Use the approach in Problem 10.64 to find the power spectral density of the error S,(f). 


(c) Find R,(k) as follows: Let Z = e/?"/, factor the denominator S,(f), and take the in- 
verse transform to show that: 


2 

OXZ 
all where 0< z <1. 
a(1 — zz) 


RAk) = 
(d) Find an expression for the resulting mean square error. 
Find the Wiener filter in Example 10.25 if N(t) is white noise of noise density No/2 = 1/3 
and Z(t) has power spectral density 
4 


SAf) = a+ 4p 


Find the mean square error for the Wiener filter found in Example 10.25. Compare this 


with the mean square error of the infinite-smoothing filter found in Problem 10.67. 
Suppose we wish to estimate (predict) X(t + d) by 


X(t +d) = [roxa — 7) dr. 
(a) Show that the optimum filter must satisfy 
Ry(7 + d) = [ rore — x) dx T=0. 
(b) Use the Wiener-Hopf method to find the optimum filter when Ry(r) = e”, 


Let X, = Z, + N,, where Z,, and N, are independent random processes, N,, is a white 
noise process with c4 = 1, and Z, is a first-order autoregressive process with R7(k) = 


4(1/2)*. We are interested in the optimum filter for estimating Z, from X,, Xj-1,.--- 
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(a) Find Sy(f) and express it in the form: 


1, Let) 1 zel 
2z1 Zi 
(1 R Lerm )(1 = ser) 
2 2 


Sx(f) 


(b) Find the whitening causal filter. 
(c) Find the optimal causal filter. 


Section 10.5: The Kalman Filter 


10.73. 


10.74. 
10.75. 
10.76. 


If W, and N, are Gaussian random processes in Eq. (10.102), are Z, and X,, Markov 
processes? 


Derive Eq. (10.120) for the mean square prediction error. 

Repeat Example 10.26 with a = 0.5 anda = 2. 

Find the Kalman algorithm for the case where the observations are given by 
Xn = b,Z, + Nn 


where b,, is a sequence of known constants. 


*Section 10.6: Estimating the Power Spectral Density 


10.77. 


10.78. 


10.79. 


10.80. 


Verify Eqs. (10.125) and (10.126) for the periodogram and the autocorrelation function 
estimate. 


Generate a sequence X, of iid random variables that are uniformly distributed in (0, 1). 

(a) Compute several 128-point periodograms and verify the random behavior of the pe- 
riodogram as a function of f. Does the periodogram vary about the true power spec- 
tral density? 

(b) Compute the smoothed periodogram based on 10, 20, and 50 independent peri- 
odograms. Compare the smoothed periodograms to the true power spectral density. 

Repeat Problem 10.78 with X, a first-order autoregressive process with autocorrelation 

function: Ry(k) = (.9)*; Ry(k) = (1/2); Ry(k) = (1). 

Consider the following estimator for the autocorrelation function 


1 k—\m\-1 


` XnXn+m- 


NE ele eh 


Show that if we estimate the power spectrum of X,, by the Fourier transform of 7;,(m), 
the resulting estimator has mean 


EID È Rx’ je Prim, 


Why is the estimator biased? 


Section 10.7: Numerical Techniques for Processing Random Signals 


10.81. 


Let X(t) have power spectral density given by Sx(f) = B’e! / se V 27. 
(a) Before performing an FFT of Sy(f), you are asked to calculate the power in the 
aliasing error if the signal is treated as if it were bandlimited with bandwidth kW. 


10.82. 


10.83. 


10.84. 


10.85. 


10.86. 


10.87. 


(b) 


(c) 


(d) 


Problems 645 


What value of W should be used for the FFT if the power in the aliasing error is to 
be less than 1% of the total power? Assume Wọ = 1000 and £ = 1. 

Suppose you are to perform N = 2M point FFT of Sy(f). Explore how W, T, and tọ 
vary as a function of fọ. Discuss what leeway is afforded by increasing N. 

For the value of W in part a, identify the values of the parameters fy, T, and tọ for 
N = 128, 256, 512, 1024. 

Find the autocorrelation {Ry(kto)} by applying the FFT to Sy(f). Try the options 
identified in part c and comment on the accuracy of the results by comparing them 
to the exact value of Ry(7). 


Use the FFT to calculate and plot Sy(f) for the following discrete-time processes: 


(a) 
(b) 
(c) 


Ry(k) = 4a, for a = 0.25 and a = 0.75. 
Rx(k) = 4(1/2)" + 16(1/4)™. 
X, = cos(27fon + ©), where O is a uniformly distributed in (0, 27] and fọ = 1000. 


Use the FFT to calculate and plot Ry(k) for the following discrete-time processes: 


(a) 
(b) 


Sy(f) = 1 for |f| < f, and 0 elsewhere, where f, = 1/8, 1/4, 3/8. 
Sy(f) = 1/2 + 1/2 cos 2f for |f| < 1/2. 


Use the FFT to find the output power spectral density in the following systems: 


(a) 
(b) 


(c) 
(a) 


(b) 


(c) 


(a) 


(b) 


(c) 


(a) 


(b) 


Input X,, with Ry(k) = 4a", for a = 0.25, H(f) = 1 for |f| < 1/4. 

Input X,, = cos(27fon + ©), where © is a uniformly distributed random variable 
and H(f) = j2af for |f| < 1/2. 

Input X,, with Ry(k) as in Problem 10.14 with N = 3 and H(f) = 1 for |f| < 1/2. 
Show that 


Ry(t) = are f Sy(fye Tf ar} 


Use approximations to express the above as a DFT relating N points in the time do- 
main to N points in the frequency domain. 

Suppose we meet the tofo = 1/N requirement by letting to = fo = 1/ VN. Compare 
this to the approach leading to Eq. (10.142). 

Generate a sequence of 1024 zero-mean unit-variance Gaussian random variables 
and pass it through a system with impulse response h,, = e 7” for n = 0. 

Estimate the autocovariance of the output process of the digital filter and compare 
it to the theoretical autocovariance. 

What is the pdf of the continuous-time process that results if the output of the digi- 
tal filter is fed into an interpolator? 

Use the covariance matrix factorization approach to generate a sequence of 1024 
Gaussian samples with autocovariance h(t) = e”, 

Estimate the autocovariance of the observed sequence and compare to the theoret- 
ical result. 


Problems Requiring Cumulative Knowledge 


10.88. 


10.89. 


Does the pulse amplitude modulation signal in Example 9.38 have a power spectral den- 
sity? Explain why or why not. If the answer is yes, find the power spectral density. 


Compare the operation and performance of the Wiener and Kalman filters for the signals 
discussed in Example 10.26. 
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10.90. 


10.91. 


10.92. 


(a) 


(b) 


Analysis and Processing of Random Signals 


Find the power spectral density of the ARMA process in Example 10.15 by finding 
the transfer function of the associated linear system. 

For the ARMA process find the cross-power spectral density from E[Y,X,,], and 
then the power spectral density from E[Y,Y,n]- 


Let X;(t) and X(t) be jointly WSS and jointly Gaussian random processes that are input 
into two linear time-invariant systems as shown below: 


(a) 


(b) 
(c) 


(d) 


X(t) >| ai) |> NC) 


X(t) > | halt) | > V(t) 


Find the cross-correlation function of Y, (t) and Y3(t). Find the corresponding cross- 
power spectral density. 


Show that Y,(¢) and Y,(t) are jointly WSS and jointly Gaussian random processes. 
Suppose that the transfer functions of the above systems are nonoverlapping, that is, 
|H,(f)||H2(f)| = 0. Show that Y, (t) and Y(t) are independent random processes. 
Now suppose that X,(t) and X(t) are nonstationary jointly Gaussian random 
processes. Which of the above results still hold? 


Consider the communication system in Example 9.38 where the transmitted signal X(t) 
consists of a sequence of pulses that convey binary information. Suppose that the pulses 
p(t) are given by the impulse response of the ideal lowpass filter in Figure 10.6.The signal 
that arrives at the receiver is Y(t) = X(t) + N(t) which is to be sampled and processed 
digitally. 


(a) 
(b) 
(c) 


At what rate should Y(t) be sampled? 
How should the bit carried by each pulse be recovered based on the samples Y(nT)? 
What is the probability of error in this system? 


CHAPTER 


Markov Chains 1 1 


In general, the random variables within the family defining a stochastic process are not 
independent, and in fact can be statistically dependent in very complex ways. In this 
chapter we introduce the class of Markov random processes that have a simple form of 
dependence and that are quite useful in modeling many problems found in practice. 
We concentrate on integer-valued Markov processes, which are called Markov chains. 


Section 11.1 introduces Markov processes and the special case of Markov chains. 
Section 11.2 considers discrete-time Markov chains and examines the behavior of 
their state probabilities over time. 

Section 11.3 discusses structural properties of discrete-time Markov chains that 
determine their long-term behavior and limiting state probabilities. 

Section 11.4 introduces continuous-time Markov chains and considers the tran- 
sient as well as long-term behavior of their state probabilities. 

Section 11.5 considers time-reversed Markov chains and develops interesting 
properties of reversible Markov chains that look the same going forwards and 
backwards in time. 

Finally, Section 11.6 introduces methods for simulating discrete-time and contin- 
uous-time Markov chains. 


MARKOV PROCESSES 


A random process X(t) is a Markov process if the future of the process given the pre- 
sent is independent of the past, that is, if for arbitrary times t < h <: < tk < tea, 


PLX (tei) = Xer | X (te) = Xk,- X(t) = x1] 
P[X(tk+1) = Xk+1l X (te) = Xx] (11.1) 


if X(t) is discrete-valued, and 


Pla < X(tgs1) = bI X(t) = xk,- X(t) = x1] 
= Pla < X(tei1) = b| X(t.) = xx]  (11.2a) 
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XQ 


x: 


Xe) XQ) «+s X(th) X(t) Xea) 


FIGURE 11.1 
Markov property: Given X(t), X(tķ+1) is independent of samples prior to t,. 


if X(t) is continuous-valued. If the samples of X(t) are jointly continuous, then Eq. (11.2a) 
is equivalent to 


Fx (tyasy( Xvi | X (te) = xr, X (4) = 11) = fxi (Xkr1l X (te) = xk). (11.2b) 


We refer to Eqs. (11.1) and (11.2) as the Markov property. In the above expression t+ is 
the “present,” ¢,,1 is the “future,” and t,,..., ¢,-1 is the “past,” as shown in Fig. 11.1. 
Thus in Markov processes, pmf’s and pdf’s that are conditioned on several time instants 
always reduce to a pmf/pdf that is conditioned only on the most recent time instant. For 
this reason we refer to the value of X(t) at time t as the state of the process at time t. 


Example 11.1 Sum Process 


Consider the sum process discussed in Section 9.3: 


Sa = Xi + XM t+ + Xna = Spr + Xn; 


where the X;s are an iid sequence of random variables and where Sp = 0. S„ is a Markov 
process since 


PUSn+1 = Sn+1l Sn = Sns» +» SY = sı] = P[ Xa+ = Snt1 T Sn] 
P| Sn+1 = Si Sn = Sn]. 


The binomial counting process and the random walk processes introduced in Section 9.3 
are sum processes and therefore Markov processes. 


Example 11.2 Moving Average 


Consider the moving average of a Bernoulli sequence: 


1 
Y, E 3 (Xn ae Xn-1) 
where the X; are an independent Bernoulli sequence with p = 1/2. We now show that Y, is not a 


Markov process. 


Section 11.1 Markov Processes 
The pmf of Y, is 


1 
PLY, = 0] = P[X, = 0, Xn-1 = 0) = 7 


2, 
x 
ll 
JR 
ll 


1 
P(X, = 0, Xp-1 = 1] + PLX, = 1, X,- = 0] = > 
and 
1 
= 
Now consider the following conditional probability for two consecutive values of Y,,: 


PLY, = 1] = PLX, = 1, X, = 1] 


1) PIY = 1, Ypa = 1/2] 
aly, Set A ~ Pa = 12] 
PIX, =1,%,1=1%,2=0) (2 1 
= 1/2 = Ry = sat 


Now suppose we have additional knowledge about the past: 
1 z 1 PLY, = 1, Y,- = 1/2, Y,-2 = 1] 
Ree PLY,-1 = 1/2, Y,-2 = 1] 


Ply, =1|¥,1= =0, 


since no sequence of X,,’s leads to the sequence 1, 1/2, 1. Thus 


1 
Ya = 1] # PLY, = 11%1 = 5 


P| Y, =1|Y,-1 = 
z |Yn-1 2 


and the process is not Markov. 


Example 11.3 Poisson Process 
The Poisson process is a continuous-time Markov process since 
PIN (tes1) = JIN (te) = i, N(tk-1) = Xe-15---, N(4) = x1] 
= P[j — i events in tk+1 — tę seconds] 


= PIN(tes1) = j|N (te) = i). 


Example 11.4 Random Telegraph 
The random telegraph signal of Example 9.24 is a continuous-time Markov process since 
P[X (teri) = a| X(t) = b,..., X(f) = xı] 
= P[even (odd) number of jumps in tk+1 
— t seconds if a = b(a + b)] 


= P[X (tei) = al X(tg) = b]. 
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Example 11.5 Wiener Process 


The Wiener process, from Section 9.5, is a Markov process. Since it satisfies the independent 
increments property (Eq. 9.52), we have that: 


fx (Xl X (te) = Xess X(t) = x1) = ftps ny (Xk+ Z Xe) 
sof [=a] 
= P 2 a(tk+ı = tk) 
VOT) 


The Wiener process is Gaussian and so it provides an example of a Gaussian Markov process. 


An integer-valued Markov random process is called a Markov chain.! In the remain- 
der of this chapter we concentrate on Markov chains. 
If X(t) is a Markov chain, then the joint pmf for three arbitrary time instants is 


P[ X(t) = x3, X(b) = x2, X(t) = x1] 

P[X(t3) = x3|X (th) = x2, X(t) = x ]P[X (6) = x2, X(t) = x1] 
= P[ X(t) = x3| X(t) = x2]P[X (t) = x2, X(t) = x1] 

= P[X(6) = x3| X(t) = x2]P[ X(t) = x2|X (4) = xı ]P[X (4) = x1], 


where we have used the definition of conditional probability and the Markov property. 
In general, the joint pmf for k + 1 arbitrary time instants is 


PX tea) = Xk+1, X (tk) = Xk... X(t) = x1] 


= P[X(tk+1) = xXk+1|X (tk) = xx] 
xk| X (tk-1) = xk-1] --- PIX (4) = x1] 


P[X (tx) 


k 
{prix = xjl X(t) = TEG = x] (11.3) 


Thus the joint pmf of X(t) at arbitrary time instants is given by the product of the pmf of 
the initial time instant and the probabilities for the subsequent state transitions. Clearly, 
the state transition probabilities determine the statistical behavior of a Markov chain. 


DISCRETE-TIME MARKOV CHAINS 


Let X, be a discrete-time integer-valued Markov chain that starts at n = 0 with pmf 


pO) ê PIX% =j] j=0,1,2,.... (11.4) 


'See Cox and Miller [6] for a discussion of continuous-valued Markov processes. 
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We will assume that X,, takes on values from a countable set of integers, usually 
{0, 1, 2,...}. We say that the Markov chain is finite state if X,, takes on values from a 
finite set. 

From Eq. (11.3), the joint pmf for the first n + 1 values of the process is 


P(X, = in,- --, Xo = io] 
= PX, 5 in| Met = at) ee a = | Xo = io]P[Xo = io]. (141.5) 
Thus the joint pmf for a particular sequence is simply the product of the probability for 
the initial state and the probabilities for the subsequent one-step state transitions. 


We will assume that the one-step state transition probabilities are fixed and do 
not change with time, that is, 


P| Xn+1 = |X), = i] T Pij for all n. (11.6) 

X, is said to have homogeneous transition probabilities. The joint pmf for X,,,..., Xo 
is then given by 

PIX, = in,- Xo = io] = Pe Pee (11.7) 


Thus X, is completely specified by the initial pmf p;(0) and the matrix of one-step tran- 
sition probabilities P: 


Poo Poa Po 
Pio Pu Pr 
P= A f : y (11.8) 


Pio Pü 


We will call P the transition probability matrix. Note that each row of P must add to 
one since 


1= XP[Xn =j X, =i) = È py (11.9) 
j j 


If the Markov chain is finite state, then the matrix P will be an n X n nonnegative 
square with rows that add up to 1. 


Example 11.6 Two-State Markov Chain for Speech Activity 


A Markov model for packet speech assumes that if the nth packet contains silence, then the 
probability of silence in the next packet is 1 — a and the probability of speech activity is a. Sim- 
ilarly, if the nth packet contains speech activity, then the probability of speech activity in the next 
packet is 1 — 6 and the probability of silence is £. 

Let X, be the indicator function for speech activity in a packet at time n, then X, is a two- 
state Markov chain with the state transition diagram shown in Fig. 11.2(a), and transition proba- 


bility matrix 
l-a a 
P= : 11.10 
k (110 


652 Chapter 11 Markov Chains 


CaO 


(a) 


FIGURE 11.2 
(a) State transition diagram for two-state Markov chain. (b) State transition diagram for Markov chain for 
light bulb inventory. (c) State transition diagram for binomial counting process. 


FIGURE 11.3 
Trellis diagrams for Markov chain examples. 


> 1 


11.2.1 


Section 11.2 Discrete-Time Markov Chains 653 


The sample functions of X,, can be viewed as traversing the trellis diagram in Fig. 11.3(a) which 
shows the possible values of the process over time. At any give time, the process occupies the 
“state” that corresponds to its value. The sample function is realized as the process steps from 
one state at a given time instant to a state in the next time instant. The transitions are determined 
according to the transition probability matrix. 


Example 11.7 


On day 0 a house has two new light bulbs in reserve. The probability that the house will need a 
single new light bulb during day n is p, and the probability that it will not need any is q = 1 — p. 
Let Y, be the number of new light bulbs left in the house at the end of day n. Y, is a Markov chain 
with state transition diagram shown in Fig. 11.2(b), and transition probability matrix 


1 0 0 
P=|p q 0 
0 pq 


The trellis diagram for this process in Fig. 11.3(b) shows that, unless q = 1, the transition proba- 
bilities bias the process towards the “trapping” state Y,, = 0. Thus the sample functions of Y, are 
nonincreasing functions of n. 


Example 11.8 Binomial Counting Process 


Let S,, be the binomial counting process introduced in Example 9.15. In one step, S„ can either 
stay the same or increase by one. The state transition diagram is shown in Fig. 11.2(c), and the 
transition probability matrix is given by 


[1-p p 0 0 
0 1- 

pe p p 0 
0 0 1-—p p 


The trellis diagram for binomial process in Fig. 11.3(c) shows that, unless q = 1, the transition 
probabilities bias the process towards steady growth over time. The sample functions of S,, are 
nondecreasing functions of n. 


The n-Step Transition Probabilities 


To evaluate the joint pmf for arbitrary time instants (see Eq. 11.3), we need to know 
the transition probabilities for an arbitrary number of steps. Let P(n) = {p;;(n)} be 
the matrix of n-step transition probabilities, where 


pi(n) = P| Xak = j| Xr = i] n = 0,i,j = 0. (11.11) 


Note that P[Xn+k = j| Xk = i] = P[X, = j| Xo = i] for all n = 0 and k = 0, since 
the transition probabilities do not depend on time. 
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First, consider the two-step transition probabilities. The probability of going from 
state i at t = 0, passing through state k at t = 1, and ending at state j at t = 2 is 


PX = j, Xı = k, Xo = i] 


PIX = j, X= k| Xo = i] 
P(X = j| X1 = k]P[X, = k| Xo = IPL Xo = i] 
P[X = i] 
PIX = j| X, = k]P[X, = k| Xo = i] 

= Pik(1) pej(1). 
Note that p;Į(1) and px;(1) are components of P, the one-step transition probability 


matrix. We obtain p;;(2), the probability of going from i at t = 0 to j at t = 2, by sum- 
ming over all possible intermediate states k: 


Di(2 > Pik(1) pej(1 for all i, j. (11.12a) 


Equation (11.12a) states that the ij entry of P(2) is obtained by multiplying the ith row 
of P(1) by the jth column of P(1). In other words, P(2) is obtained by multiplying the 
one-step transition probability matrices: 


P(2) = P(1)P(1) = P?. (11.12b) 


Now consider the probability of going from state i at t = 0, passing through state 
k at t = m, and ending at state j at time t = m + n. Following the same procedure as 
above we obtain the Chapman—Kolmogoroy equations: 


pij(m + n) = DP m)p,(n) for alln,m = O alli, j. (11.13a) 


Therefore the matrix ofn + m step transition probabilities P(n + m) = {p(n + m)} 
is obtained by the following matrix multiplication: 


P(n + m) = P(n)P(m). (11.13b) 
It is easy to show by an induction argument that this implies that: 
P(n) = P”. (11.14) 


When the Markov chain has finite state, we can use computer programs to calculate 
the powers of P numerically. 


The State Probabilities 


Now consider the state probabilities at time n. Let p(n) = {p;(n)} denote the row vec- 
tor of state probabilities at time n. The probability p;(n) is related to p(n — 1) by 


= DPIX = jl Xn = PL Xa = i] 


= X ppn- 1). (11.15a) 
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Equation (11.15a) states that p(n) is obtained by multiplying the row vector p(n — 1) 


by the matrix P: 
p(n) = p(n — 1)P. (11.15b) 


Similarly, p;(n) is related to p(0) by 
p(n) = DPX = j|Xo = iJP[X = i] 
= > Pin) Pi(0), (11.16a) 
and in matrix notation 


p(n) = p(0)P(n) = p(0)P"™ n = 1,2,.... (11.16b) 


Thus the state pmf at time n is obtained by multiplying the initial state pmf by P”. 


Example 11.9 

To find the n-step transition probability in Example 11.7, note that 
p(n) = P[no new light bulbs needed in n days] = q” 
pa (n) = P[1 light bulb needed in n days] = npg”! 
Poo(n) = 1 -= pro(n) — pr(n). 

The other terms in P(n) are found in similar fashion, thus 


1 0 0 


P(n) = 1-q" q” 0 
{<= q” PN npq™™! npq™™! q” 


Note that if q < 1 then, as n > œ, 


1 0 0 
P(n)>|1 0 0). 
1 0 0 


As a result, the state pmf p(n) = (po(n), p(n), p(n)) approaches 


p(n) = (po(0), pi(0), pa(0))P(n) 
= (0,0, 1)P(n) 
1 0 0 
—(0,0,1)| 1 0 0] =(1,0,0), 
1 0 0 


where (po(0), pı(0), p2(0)) is the row vector of initial state probabilities and (po(0), p,(0), 
p2(0)) = (0,0, 1) since we start with two light bulbs. As time progresses, pọ(n) — 1. In words, 
the above equation states that we eventually run out of light bulbs! 
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Example 11.10 
Let a = 1/10 and $ = 1/5 in Example 11.6. Find P(n) for n = 2, 4, 8, and 16. 


pul? 41 me es mee 
2 8 .34 66 
p- 83 1 * | 7467 .2533 
34 66 .5066 .4934 
pe = 6859 .3141 pis = 6678 .3322 
.6282 .3718 6644 3356 | 
There is a clear trend here: It appears that as n > œ, 


2/3 1/3 
Pp" ; 
2/3 1/3 
We can use matrix diagonalization methods from linear algebra to find P” [Anton, p. 246]. 
First we find that the eigenvalues of P are 1 and 1 — a — B from: 


and similarly 


1-a-A a 
B 1-B-A 
=(1-A)(1-a-B-A). 


o = defe =a) = | =(1-a-—A)(1-8-A)-aß 


The corresponding eigenvectors are: 


Te 


so the matrix with eigenvectors as columns is: 


We then have that: 


= 4 1 1 a 1 0 B a 
SERRE -Hl A E, al 


The payoff is in the calculation of P”: 


P” = (EAE')(EAE')...(EAE"') = EA(E'E)A...A(E'E)AE! 
EAA... AE! = EAE” 
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As long as |1 — a — B| < 1, the second term goes to zero as n > œ and so 


B a [2 1 

n| 2 Boa+By]_ 3 3 
B a 2 1 
la+ß a+e |3 3 


Note that all the rows are the same in the limiting matrix. 


Example 11.11 
Let the initial state probabilities in Example 11.10 be 
P[Xo = 0] = po(0) and P[Xo = 1] = 1 — po(0). 


Find the state probabilities as n — œ. 
The state probability vector at time n is: 


P(7) = (po(0), 1 — po(0)P” 


a a Ila ‘i a a 
= (po(0),1 - pa(o)| * ¢ F es eee ee mo | | 


As n —> œ, we have that 


p(7) > (po(0), 1 — po(9)) 


Biba 


We see that the state probabilities do not depend on the initial state probabilities as n — 00, 


WIN WIN 
WlR WlR 


Example 11.12 Google PageRank 


A Web surfer browses pages in a five-page Web universe shown in Fig. 11.4(a). The surfer selects 
the next page to view by selecting with equal probability from the pages pointed to by the cur- 
rent page. If a page has no outgoing link (e.g., page 2), then the surfer selects any of the pages in 
the universe with equal probability. Find the probability that the surfer views page i. 


b oe 2 
2 2 o 


(a) (b) 


FIGURE 11.4 
State-transition diagrams for PageRank examples. 


658 


11.2.3 


Chapter 11 Markov Chains 


The viewing behavior can be modeled by a Markov chain where the state represents the 
page currently viewed. If the current page points to k pages, then the next page is selected from 
that group with probability 1/k. If the current page does not point to any pages, then the next page 
can be any of the 5 pages with probability 1/5. The transition probability for the Markov chain is: 


0 12 12 0 0 
1/55 1/5 1/5 1⁄5 15 
P=[p]=|13 18 0 18 0 


We can obtain the limiting state probabilities numerically by letting Octave calculate a high 
power of P, say P>’. We then obtain a 5 X 5 matrix in which all the rows are equal: 


p(n) — (0.12195, 0.18293, 0.25610, 0.12195, 0.31707). 


In the next subsection we will show an easier way of finding the steady state pmf. 

The random surfer model forms the basis for the PageRank algorithm that was introduced 
by Google to rank the importance of a page in the Web. The rank of a page is given by the steady 
state probability of the page in the Markov chain model. The size of the state space in this 
Markov chain is in the billions of pages! 


Steady State Probabilities 


Example 11.11 is typical of Markov chains that settle into stationary behavior after the 
process has been running for a long time. As n —> œ, the n-step transition probability 
matrix approaches a matrix in which all the rows are equal to the same pmf, that is, 


p(n) >m; for alli. (11.17a) 
We can express the above in matrix notation as: 
P”—> ir (11.17b) 


where 1 is a column vector of all 1’s, that is, 17 = (1,1,...) and m = (a, mT1,...). 
From Eq. (11.16a), the convergence of P” implies the convergence of the state pmf’s: 


p(n) = > p;(n)p:(0) =? > 7P:(0) = Tj. (11.18) 


We say that the system reaches “equilibrium” or “steady state.” 
We can find the pmf a = { aj} in Eq. (11.18) (when it exists) by noting that as 
n> œ, p(n) > 7; and p(n — 1) > 7;, so Eq. (11.15) approaches 


L 
which in matrix notation is 


a = nP. (11.19b) 
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Equation (11.19b) is underdetermined and requires the normalization equation: 


D7 = 1. (11.19c) 


L 


We refer to m as the stationary state pmf of the Markov chain. If we start the 
Markov chain with initial state pmf p(0) = m, then by Eqs. (11.16b) and (11.19b) we 
have that the state probability vector 


p(n) = mnP”= rm for all n. 


The resulting process is a stationary random process as defined in Section 9.6, since the 
probability of the sequence of states ip, i,,..., i, starting at time k is, by Eq. (11.5), 


P| Xn+k = Lysceei AXR = io] 
= P[Xntk = in| Xnsk-1 = ina]... P[ Kise = il Xk = ig] P[ Xe = io] 
= P| Xnr = inl Xn- = ina). PLM se = il Xk = io], 


5 Pi, yi ++ Pigs Mig? 


which is independent of the initial time k. Thus the probabilities are independent of the 
choice of time origin, and the process is stationary. 


Example 11.13 


Find the stationary state pmf in Example 11.6. 
Equation (11.19a) gives 


To = (1 — a@)m + Br 


Tı = amo + (1 — B)m, 


which imply that aro = Ba, = B(1 — To) since mo + m4 = 1. Thus 


B 2 o a 1 
a+B 3 ue a+B 3 


To 


In this section we have shown the typical behavior of many Markov chains where 
the n-step transition probabilities and the state probabilities converge to constants that 
are independent of the initial conditions. These constant probabilities are found by solving 
the set of linear equations (11.19). It is worth noting, however, that not all Markov chains 
settle into stationary behavior where the process “forgets” the initial conditions. For ex- 
ample, the binomial counting process (Example 9.15) with p > 0 grows steadily so that 
for any fixed j, p;(n) — 0 as n > oo. The following example shows two atypical situations 
where the initial conditions determine the behavior for all time. 


Example 11.14 Two-State Process with Atypical Behavior 


Consider the two-state process with state transition diagram shown in Fig. 11.2(a). In Example 
11.10 we found that the two-state process settles into steady state behavior so long as 
|1 — a — B| < 1. Let’s see what happens when this condition is not satisfied. 
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Consider first the case where a = 6 = 1, and suppose that we start the process in state 0, 
that is, pp(0) = 1. The state probabilities at time n are: 


p(n) = (po(0), 1 = po(0)P" = aoli aE 


The process in this case alternates between state 0 at even time instants and state 1 at odd time 
instants. P” does not converge, and instead alternates assuming the values P and P? = I. The 
state probability vector alternates between the values (1, 0) and (0, 1) so it does not exhibit 
convergence. 

Now consider the case a = 8 = 0, and suppose again that we start the process in state 0, 
that is, pp(0) = 1. The state probabilities at time n are: 


p(n) = aoli i = (1,0) for all n. 


In this case, the process remains fixed at state 0, which was selected at the initial time instant. 
Note that the process would have remained fixed at state 1 if state 1 had been selected initially. 
The state probability vector remains fixed at (1, 0) if the initial state was 0 or (0, 1) if the initial 
state was 1. In this case, both P” and p(n) converge immediately but to values that are deter- 
mined by the initial condition. 


The previous example demonstrates that we need to identify the conditions 
under which the state probability of Markov chains will converge to a stationary pmf 
that is found from Eq. (11.19). This is the topic of the next section. 


CLASSES OF STATES, RECURRENCE PROPERTIES, AND LIMITING PROBABILITIES 


In this section we take a closer look at the relation between the behavior of a Markov 
chain and its transition probability matrix. First we see that the states of a discrete-time 
Markov chain can be divided into one or more separate classes and that these classes 
can be of several types. We then show that the long-term behavior of a Markov chain is 
related to the types of its state classes. Figure 11.5 summarizes the types of classes to 
which a state can belong and identifies the associated long-term behavior. 


Classes of States 


We say that state j is accessible from state i if for some n = 0, p;;(n) > 0, that is, if 
there is a sequence of transitions from i to j that has nonzero probability. We say that 
states i and j communicate if they are accessible to each other; we then write i<j. 
Note that a state communicates with itself since p;;(0) = 1. 

If state i communicates with state j and state j communicates with state k, that is, 
i< j and j © k, then state i communicates with k. To see this, note that i<j implies 
that there is a nonzero probability path from i to j and j < k implies that there is a sub- 
sequent nonzero probability path from j to k. The combined paths form a nonzero 
probability path from i to k. A nonzero probability path in the reverse direction exists 
for the same reasons. 
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State 


/\ 


Transient Recurrent 


Positive Null 
recurrent recurrent 


7; > 0 7, =0 


/ 


Aperiodic Periodic 
impa fimpylnd) = dr; 


FIGURE 11.5 

Classification of states and associated long- 
term behavior. The proportion of time spent 
in state j is denoted by 77. 


We say that two states belong to the same class if they communicate with each 
other. Note that two different classes of states must be disjoint since having a state in 
common would imply that the states from both classes communicate with each other. 
Thus the states of a Markov chain consist of one or more disjoint communication classes. 
A Markov chain that consists of a single class is said to be irreducible. 


Example 11.15 


Figure 11.6(a) shows the state transition diagram for a Markov chain with three classes: 
{0}, {1, 2}, and {3}. 


Example 11.16 


Figure 11.6(b) shows the state transition diagram for a Markov chain with one class: {0, 1, 2, 3}. 
Thus the chain is irreducible. 


Example 11.17 Binomial Counting Process 


Figure 11.6(c) shows the state transition diagram for a binomial counting process. It can be seen 
that the classes are: {0}, {1}, {2},.... 
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FIGURE 11.6 
(a) A three-class Markov chain. (b) A periodic Markov chain. (c) A binomial counting 
process. (d) The random walk process. 


Example 11.18 Random Walk 


Figure 11.6(d) shows the state transition diagram for the random walk process. If p > 0, then the 
process has only one class, {0, +1, +2,...}, so it is irreducible. 


11.3.2 Recurrence Properties 


Suppose we start a Markov chain in state i. State i is said to be recurrent if the process 
returns to the state with probability one, that is, 


fi = P[ever returning to state i] = 1. (11.20a) 
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State i is said to be transient if 
fi<. (11.20b) 


If we start the Markov chain in a recurrent state i, then the state reoccurs an infi- 
nite number of times. If we start the Markov chain in a transient state, the state does 
not reoccur after some finite number of returns. Each reoccurrence of the state can be 
viewed as a failure in a Bernoulli trial. The probability of failure is f;. Thus the number 
of returns to state i terminating with a success (no return) is a geometric random vari- 
able with mean (1 — f;) |. If f; < 1, then the probability of an infinite number of suc- 
cesses is zero. Therefore a transient state reoccurs only a finite number of times. 

Let X,, denote the Markov chain with initial state i, Xo = i. Let I, X) be the in- 
dicator function for state i, that is, 7;( X) is equal to 1 if X = i and equal to 0 otherwise. 
The expected number of returns to state i is then 


a| Sas \| Xo = i|- Set Xa) | Xo = i] = Spln) (11.21) 


since by Example 4.16 


EI Xn)|Xo = i] = PLX, = il Xo = i] = pin). 


A state is recurrent if and only if it reoccurs an infinite number of times, thus from 
Eq. (11.21) state i is recurrent if and only if 


> Pal) = œ, (11.22) 


Similarly, state i is transient if and only if 


> Pil”) < 00, (11.23) 


Example 11.19 
In Example 11.15 (Fig. 11.6a), state 0 is transient since poo(m) = (1/2)”, so 


Sola) =; G i (3) pss = 1 < 00, 


On the other hand, if the process were started in state 1, we would have the two-state process dis- 
cussed in Example 11.10. For such a process we found that 


Bta(l—a—p)" 1/2 + 1/4(7/10)" 
a+ B 3/4 


o% 00 (7/10)" 
> pu(n) = 5(2 H 3 ) = 00, 


Therefore state 1 is recurrent. 


pu(n) = 


so that 
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Example 11.20 Binomial Counting Process 


In the binomial counting process all the states are transient since p(n) = (1 — p)” so that for 
p > 0, 


[00] 


(oe) 1 = 
> Pal) = 20 p)" = Pe 00, 


Example 11.21 Random Walk 


Consider state zero in the random walk process in Fig. 11.6(d). The state reoccurs in 2n steps if 
and only if n +1s and n —1s occur during the 2n steps. This occurs with probability 


2n 
Poo(2n) = (” Jra =p)" 
Stirling’s formula for n! can be used to show that 
2n\ p n _ (p0 - 23) 
Pie p)" ~ , 
n Vin 


where a, ~ b, when lim, .04p/b, = 1. 
Thus Eq. (11.21) for state 0 is 


2 Gra- 


> po(2n) 2n) s 


If p = 1/2, then 4p(1 — p) = 1 and the series diverges. It then follows that state 0 is recurrent. 
If p # 1/2, then (4p(1 — p)) < 1, and the above series converges. This implies that state 0 is 
transient. Thus when p = 1/2, the random walk process maintains a precarious balance about 0. 
As soon as p # 1/2, a positive or negative drift is introduced and the process grows towards +00. 


Recurrence and transience are class properties: If a state i is recurrent, then all 
states in its class are recurrent; if a state is transient, then all the states in its class are 
transient. If state i is recurrent, then all states in its class will be visited eventually as the 
process forever returns to state į over and over again. Indeed all other states in its class 
will appear an infinite number of times. 

To show the recurrence class property, let i be a recurrent state and let j be an- 
other state in the class, then i<>j, and there are probabilities p;(m) > 0 and 
p;j() > 0 that corresponds to nonzero probability paths that lead from j to i in m 
steps, and back from i to j in / steps. We can identify many nonzero probability paths 
that go from j to j by splicing the above two paths to recurrent paths for state i: go from 
j to i using the above path; then from 7 to i using an n-step recurrent path; then back 
from i to j using the above path. The probabilities for these paths provide a lower 
bound to the recurrence probabilities for j: 


> Pil) > Zp m) pi(n )Pii(!) = Dji(m )p;(l (D È piln) Fos 
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since state i is recurrent. This implies that state j is also recurrent. Now suppose that 
state i is transient, and let j be another state in its class. State j cannot be recurrent, for 
this would imply that i is recurrent, in contradiction to our assumption. Therefore j 
must be transient. 

If a Markov chain is irreducible then either all its states are transient or all its states 
are recurrent. If the Markov chain has a finite state space, it is impossible for all of its 
states to be transient. At least some of the states must occur an infinite number of 
times as time progresses, implying that all states are recurrent. Therefore, the states of a 
finite-state, irreducible Markov chain are all recurrent. If the state space is countably in- 
finite, then all the states can be transient. The random walk with p # 1/2 provides an 
example of such a Markov chain. 

The structure of the state transition diagram and the associated nonzero transi- 
tion probabilities can impose periodicity in the realizations of a discrete-time 
Markov chain. We say that state i has period d if it can only reoccur at times that are 
multiples of d, that is, p(n) = 0 whenever n is not a multiple of d, where d is the 
largest integer with this property. We say that state i is aperiodic if it has period 
d=1. 

Periodicity is a class property, that is, all states in a class have the same period. An 
irreducible Markov chain is said to be aperiodic if the states in its single class have pe- 
riod one. An irreducible Markov chain is said to be periodic if its states have period 
d>1. 

To show that periodicity is a class property, suppose that state i has period d 
and let j be another state in the same class. Since i<>j, there are probabilities 
pji(m) > 0 and p;;(l) > 0 that corresponds to paths that lead from j to i in m steps, 
and back from i to j in / steps. We can create a path from j to j by splicing the m-step 
path for j to i with the /-step path from i to j; this path has length m + l and proba- 
bility p;(m)p;(l) > 0. The length m + / must be divisible by d’, the period of state 
j. Now create multiple paths from j to j by attaching the above two paths to nonzero 
probability paths that go from i to i in n steps. These paths have length m +/+ n 
and probability pj(7m) p(n) pj(1) > 0. All these paths go from j tojsom+n+l 
must be divisible by d’. We already showed that m + lis divisible by d’, so we have 
that n must also be divisible by d’. But n can be the length of any path that goes 
from i to i, and so d, the period of state i, is the largest value that divides all such n. 
This implies that d’ must divide d. By reversing the roles of state i and state j, the 
same series of arguments imply that d must divide d’. Thus d = d’ and state i and 
state j have the same period. 


Example 11.22 Two-State Process with Atypical Behavior 


Characterize the two “atypical” Markov chains in Example 11.14. 

In the case where a = B = 1, Fig. 11.2(a) shows that we have a single communication 
class with period d = 2. This explains why the process alternates between state 0 at even time in- 
stants and state 1 at odd time instants 

In the case a = B = 0, we have two communication classes: {0} and {1}. The selection of 
the initial state at t = 0 effectively picks one of the two classes, and the process remains in that 
class forever. 
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Example 11.23 


In Example 11.15 (Fig 11.6a), all the states have the property that p,(n) > 0 for n = 1,2,.... 
Therefore all three classes in the Markov chain have period 1. 


Example 11.24 


In the Markov chain in Fig 11.6(b), the states 0 and 1 can reoccur at time 2, 4,6,... and states 2 
and 3 at times 4, 6, 8,.... Therefore the Markov chain has period 2. 


Example 11.25 


In the random walk process in Fig 11.6(d), a state reoccurs when the number of successes (+1s) 
equals the number of failures (—1s). This can only happen after an even number of steps. The 
process therefore has period 2. 


Figure 11.7(a) summarizes the possible structures that can be encountered for 
Markov chains. In the case of irreducible finite-state Markov chains, all states in the single 
class must be recurrent and the class can either be aperiodic or periodic. If a finite-state 


Aperiodic y 


Recurrent 


Irreducible 


V 


1 Periodic d 


Finite State Transients + 


1 Irreducible 


Multi-Class 


Multiple 
Irreducibles 
(a) 
Transient 
Irreducible 
Aperiodic y 
y Recurrent 
Infinite State == ae 
J Periodic ri 
Multi-Class 
(b) 


FIGURE 11.7 
Possible structures for Markov chains. 
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Markov chain consists of multiple transient classes and a single irreducible class, then the 
chain will eventually settle in the states of the irreducible class. Thus in the long-run the 
behavior is the same as that of an irreducible chain. A finite-state Markov chain with mul- 
tiple irreducible classes will eventually enter and remain thereafter in one of the irre- 
ducible classes. Over the long run, the chain will exhibit the behavior of an irreducible 
Markov chain with the given class of states. Thus the case of multi-irreducible classes can 
be viewed as a two stage random experiment in which the first stage involves selecting one 
of the irreducible classes. 

Figure 11.7(b) summarizes the possible structures for Markov chains with infinite 
state space. The major difference from the finite case is that an irreducible class can have 
all of its states be transient. Consequently when a chain has multiple classes it is now pos- 
sible for the chain to enter and remain in a class that is either transient or recurrent. 


Limiting Probabilities 


If all the states in a Markov chain are transient, then all the state probabilities ap- 
proach zero as n — œ. If a Markov chain has some transient classes and some recur- 
rent classes, as in Fig. 11.6(a), then eventually the process enters and remains thereafter 
in one of the recurrent classes. Therefore we can concentrate on individual recurrent 
classes when studying the limiting probabilities of a chain. For this reason we assume in 
this section that we are dealing with an irreducible Markov chain. 

Suppose we start a Markov chain in a recurrent state i at time n = 0. Let 
T,(1), T;(1) + T;(2),... be the times when the process returns to state i, where T;(k) is 
the time that elapses between the (k — 1)th and kth returns (see Fig. 11.8). The T; form 
an iid sequence since each return time is independent of previous return times. 

The proportion of time spent in state i after k returns to i is 


k 
T(1) + T(2) +- + Tk) 


proportion of time in state i = (11.24) 


| || | n 


OQ ~——_________» «—__» <«» 
TQ) T,(2) T,(3) T4) 
FIGURE 11.8 


Recurrence times for state i. 
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Since the state is recurrent, the process returns to state i an infinite number of times. 
Thus the law of large numbers implies that, with probability one, the reciprocal of the 
above expression approaches the mean recurrence time £[7;] so the long-term pro- 
portion of time spent in state i approaches 


proportion of time in state i —> (11.25) 


—_ = g,, 
EIT] ' 
where 77; is the long-term proportion of time spent in state i. 
If E[T;] < ©, then we say that state i is positive recurrent. Equation (11.25) then 
implies that 
7, > 0 if state i is positive recurrent. 


If E[7;] = œ, then we say that state i is null recurrent. Equation (11.25) then implies that 
7, = 0 if state 7 is null recurrent. 


It can be shown that positive and null recurrence are class properties. 

Positive recurrent, aperiodic states are called ergodic. Once a Markov chain en- 
ters an ergodic state, then the process will remain in the state’s class forever. Further- 
more the process will visit all states in the class sufficiently frequently that the 
long-term proportion of time in a given state will be governed by Eq. (11.25) and ap- 
proach a nonzero value. Thus the process will reveal its underlying state probabilities 
through time averages. Given our previous discussion on ergodicity in Chapter 9, it is 
not surprising that an ergodic Markov chain is defined as an irreducible, aperiodic, pos- 
itive recurrent Markov chain. 


Example 11.26 


The process in Fig. 11.6(b) returns to state 0 in two steps with probability 1/2 and in four steps 
with probability 1/2. Therefore the mean recurrence time for state 0 is 


1 1 
ET] = =(2) + =(4) = 3. 
[Te] = 5 (2) + 54) 
Therefore state 0 is positive recurrent and the long-term proportion of time spent in state 0 is 
a 
To 3 ` 


Example 11.27 Random Walk 


In Example 11.21 it was shown that the random walk process is recurrent if p = 1/2. However, 
the mean recurrence time can be shown to be infinite when p = 1/2 (Feller, 1968, p. 314). Thus 
all the states in the chain are null recurrent. 


The 7;’s in Eq. (11.25) satisfy the equations that define the stationary state pmf: 


a; = >\7Pj — forall j (11.26a) 
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and 
1= Ùr. (11.26b) 
To see this, note that since 77; is the proportion of time spent in state i, then 7;P;; 


is the proportion of time in which state j follows i. If we sum over all i, we then obtain 
the long-term proportion of time in state j, 77. 


Example 11.28 


The stationary state pmf for the periodic Markov chain in Fig. 11.6(b) is found from Eqs. (11.26a) 
and (11.26b): 


To = zm + T3 
Ti = To 
1 
T2 771 
T3 = T? 


These equations imply that 7, = mo and m3 = 73 = 7/2. Since the probabilities must add to 
one, we obtain 


1 
and 72 = 73> 7. 


1 
3 6 


7, = To = 


Note that mo = 1/3 was obtained for the mean recurrence time in Example 11.26. 


In Section 11.2 we found that for certain Markov chains, the n-step transition ma- 
trix approaches a fixed matrix of equal rows as n — œ (see Eq. 11.17). We also saw 
that the rows of this limiting matrix consisted of a pmf that satisfied Eqs. (11.26a) and 
(11.26b). We are now ready to state under what conditions this occurs. 


Theorem 1° 
For an irreducible aperiodic Markov chain exactly one of the following assertions holds: 


(i) All states are transient or all states are null recurrent; p;() > 0 as n—> œ for all i and j 
and there exists no stationary pmf 


(ii) All states are positive recurrent, so 


lim p(n) = m; for all j (11.27) 


n=% J 


where {7;,j = 1,2, 3,... } is the unique stationary pmf solution to Eq. (11.26ab). 


2A proof to Theorem 1 is given by [Ross, pp. 108-110]. 
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Theorem 1 states that for ergodic Markov chains, the n-step transition proba- 
bilities approach constant values given by the steady state pmf. Note that Eq. (11.27) 
can be written in matrix form as shown in Eq. (11.17b). From Eq. (11.18), it then fol- 
lows that the state probabilities approach steady state values that are independent of 
the initial conditions. These steady state probabilities correspond to the stationary 
probabilities obtained by solving Eq. (11.26ab), and thus correspond to the long- 
term proportion of time spent in a given state. Theorem 1 also states that if the irre- 
ducible Markov chain is transient or null recurrent, then a stationary pmf solution to 
Eq. (11.26ab) does not exist. This implies that when we do find a solution, and the 
chain is irreducible and aperiodic, then the Markov chain must be positive recurrent 
and hence ergodic. 


Example 11.29 Age of a Device 


Consider a Markov Chain that counts the age of a device in service at the end of each day. At the 
end of each day, the device either increases its age by 1 (with probability a) or fails and returns to 
the “1” state (with probability 1 — a). A failed device is replaced at the beginning of the next day 
and the age counting processes is resumed. Determine whether the Markov chain has a station- 
ary distribution. 

The state transition diagram for the Markov chain is shown in Fig. 11.9. If a > 0, then 
every state i can access any state i + 1, and consequently any state i can access any state j > i. 
In addition every state i can access state 1. This implies that there is a nonzero probability path 
between any two states, and so the Markov chain is irreducible. State 1 can reoccur in intervals of 
1,2,3,4,..., and so state 1 has period 1. Therefore all the states have period 1 and the Markov 
chain is aperiodic. 

The equations for the stationary probabilities are: 


m= (1 -ajm + (1 —- a)m + +- = (l= aay +m + ee Loe 
Tis, = am; for i21. 
By a simple induction argument we can show that: 
m; = (1 — ajad! for i21. 


Therefore the Markov chain is positive recurrent and has this stationary pmf. 


FIGURE 11.9 
Age of a device. 
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Example 11.30 Google PageRank Algorithm 


In Example 11.12 we showed the basic approach for ranking Web pages according to an associ- 
ated Markov chain. The approach included a strategy to deal with the case where users become 
trapped in a page with no outgoing links, i.e., page 2 in Fig. 11.4(a). The approach, however, is 
not sufficient to ensure that the Markov chain is irreducible and aperiodic. For example, in 
Fig. 11.4(b) users can also become trapped in the periodic class {4, 5}. This poses a problem for 
the rank algorithm which uses the power of the transition probability matrix to obtain the sta- 
tionary pmf. To deal with this problem, the PageRank algorithm also assumes that each time a 
new page is selected, the procedure in Example 11.12 is used with probability a, but otherwise 
(with probability 1 — a) any of all possible Web pages is selected with equal probability. The 
value a = 0.85 is usually cited as appropriate. The modified ranking method then has a transition 
probability matrix that is aperiodic and irreducible and the conditions of Theorem 1 are satisfied. 
For the example in Fig. 11.4(b) we have: 


0 12 12 0 0 1/5 1⁄5 1⁄5 1/5 1/5 
1/5 WS 1/5 1⁄5 1/5 1/5 1⁄5 1⁄5 1⁄5 1/5 
P = (0.85)| 1/3 13 0 1⁄3 O |+ (0.15) 1/5 1⁄5 1⁄5 1⁄5 1/5 
0 0 0 0 1 1/5 1⁄5 1⁄5 1⁄5 1/5 
0 oO 12 0 1/2 1/5 1⁄5 1⁄5 1/5 1/5 
0.0300 0.4550 0.4550 0.0300 0.0300 
0.2000 0.2000 0.2000 0.2000 0.2000 
= | 0.3133 0.3133 0.0300 0.3133 0.0300 
0.0300 0.0300 0.0300 0.0300 0.8800 
0.0300 0.0300 0.4550 0.0300 0.4550 
The matrix P has a stationary state pmf given by: 
p(n) = (0.13175, 0.18772, 0.24642, 0.13173, 0.30239). 
See [Langville] for more details on the PageRank algorithm. 
For periodic processes, we have the following result. 
Theorem 2 
For an irreducible, periodic, and positive recurrent Markov chain with period d, 
lim p,j(nd) = dr; for all j (11.28) 


no 


where 7; is the unique nonnegative solution of Eqs. (11.26a) and (11.26b). 


As before, 77; represents the proportion of time spent in state j. However, the fact 
that state j is constrained to occur at multiples of d steps implies that the probability of 
occurrence of the state j is d times greater at the allowable times and zero elsewhere. 
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Example 11.31 


In Examples 11.26 and 11.28 we found that the long-term proportion of time spent in state 0 is 
To = 1/3. If we start in state 0, then only even states can occur at even time instants. Thus at these 
even time instants the probability of state 0 is 2/3 and of state 2 is 1/3. At odd time instants, the 
probabilities of states 0 and 2 are zero. 


Theorems 1 and 2 only address the most important cases of irreducible, periodic 
and aperiodic Markov chains indicated by the checkmarks in Fig. 11.7. The following 
example considers a case not covered by Theorems 1 and 2. 


Example 11.32 Markov Chain with Multiple Irreducible Classes 
Does the Markov chain in Fig. 11.6(a) have a unique stationary pmf? 
The equations for the stationary probabilities are: 
Po = 1/2po 
pı = 910p, + 1/5p, 
Po = 1W4pp + 1/10p, + 4/Sp, 
D3 = 1/4po + ps- 


The first equation implies that pọ = 0, which reduces the fourth equation to p = p3, which im- 
poses no constraints on p3. The middle two equations are equivalent and both imply that 
pı = 2p,. The normalization condition requires that 1 = p) + p) + p3 = 3p. + p3. Therefore 
the equations are underdetermined and there are many solutions with the form: 
(0, 2p2, p2, 1 — 3p2) where 0 = p, = 1/3. 

Now let’s approach the problem according to its three classes: {0}, {1,2}, and {3}. The 
first class is transient and the other two classes are recurrent. Suppose the initial state is 3, then 
the process remains in that state forever. The stationary pmf for class {3} by itself is (0, 0, 0, 1). If 
the initial state is 1 or 2, then the process remains in this class forever; the stationary pmf for this 
class in isolation is (0, 2/3, 1/3, 0). Finally if the initial state is 0, then the process will eventually 
leave and enter one of the other two classes with equal probability. In the general case, if the ini- 
tial state is selected according to the pmf ((0), p1(0), p2(0), p3(0)) then the class {1, 2} will be 
entered with probability 1/2 po(0) + p,(0) + p2(0), and class {3} will be entered with probability 
1/2 po(0) + p3(0). The stationary pmf would then have the form: 


{1/2 po(0) + pi(0) + p2(0)}(0, 2/3, 1/3, 0) + {1/2 po(0) + p3(0)} (0,0, 0, 1) 
= (0, 2/3, 1/3,0) + (1 — y)(0,0,0, 1) 
a (0, 2y/3, y/3, 1- y). 


If we let y/3 = p, we see that this solution has the form we derived before. 

For example, suppose the initial pmf was (0, 1/3, 1/6, 1/2), then this pmf satisfies the condi- 
tion for a stationary pmf and the repeated multiplication by P will yield the same pmf. In this 
sense this multiclass Markov chain has a stationary pmf. Note however that the relative frequen- 
cies of the states depend on which irreducible class is actually entered. Thus if we record long- 
term average frequencies we will observe either (0, 2/3, 1/3, 0) or (0, 0, 0, 1). The stationary pmf 
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does not correspond to either of these two pmf’s; instead the stationary pmf gives us the expected 
value of the two pmf’s: 


(0, 1/3, 1/6, 1/2) = 1/2(0, 2/3, 1/3, 0) + 1/2(0, 0, 0, 1) 


where 1/2 is the probability of entering the two irreducible classes for this choice of initial pmf. 


Example 11.32 illustrates the behavior of multiclass finite-state Markov chain. In 
these chains the process will eventually enter and remain forever in one of its recurrent 
classes. Each recurrent class can be considered as a separate irreducible Markov chain 
with its own stationary pmf. The multiclass Markov chain will then have stationary 
pmf’s that depend on the stationary pmf’s of its constituent recurrent classes according 
to the initial state probabilities. These multiclass Markov chains are not ergodic since 
the relative frequencies of the states do not correspond to the stationary pmf. 

If a multiclass chain has infinite state space, then the situation discussed above 
can occur as a special case: the process initially works its way through transient classes 
and eventually settles in one of a number of ergodic classes. However, in general, it is 
possible for some or all of the classes to be transient and/or null recurrent. In such case 
the process may never settle into stationary behavior. 


CONTINUOUS-TIME MARKOV CHAINS 


In Section 11.2 we saw that the transition probability matrix determines the behavior 
of a discrete-time Markov chain. In this section we see that the same is true for contin- 
uous-time Markov chains. 

The joint pmf for k + 1 arbitrary time instants of a Markov chain is given by 
Eq. (11.3): 


P[X(tk+1) = Xk+1, X (tk) = Meet X (t1) = x1] 
P[X(tk+1) = Xk+ | X(t xk] 
x PLX(h) = x2| X(t) = x ]P[X(4) = x1]. (11.29) 


X(t 
tk) = 


This result holds regardless of whether the process is discrete-time or continuous-time. 
In the continuous-time case, Eq. (11.29) requires that we know the transition probabil- 
ities from an arbitrary time s to an arbitrary time s + t: 


P[X(s+t)=j|X(s)=i]) t=O. 


We assume here that the transition probabilities depend only on the difference 
between the two times: 


P[X(s + t) = j| X(s) = i] = P[X(t) = j| X(0) = i] = p(t) 
= 0, all s. (11.30) 


We say that X(t) has homogeneous transition probabilities. 
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Let P(t) = {p;;(t)} denote the matrix of transition probabilities in an interval of 
length t. Since p;(0) = 1 and p;(0) = 0 fori # j, we have 


P(0) = 1, (11.31) 


where / is the identity matrix. 


Example 11.33 Poisson Process 


For the Poisson process, the transition probabilities satisfy 


p(t) = P[j — i events in t seconds|] 


= po,;-i(t) 
(ary) 
=e jf Bi. 
(=i)! 
Therefore B 
e% ate“ (at) e™/2! 
0 em ate ™ (at)*e"'/2! 
P(t) = —at —at 
0 0 e ate 


at 


As t approaches zero, e “ ~ 1 — at. Thus for a small time interval ô, 


[1 — aô ad 0 

0 1 — ad ad tee 
P(8) ~ 
(8) 0 0 1-aé -:-[ 


where all terms of order 6’ or higher have been neglected. Thus the probability of more than one 
transition in a very short time interval is negligible. Note that this is consistent with the assump- 
tions made in deriving the Poisson process in Section 9.4. 


Example 11.34 Random Telegraph 


In the random telegraph example, the process X(t) changes with each occurrence of an event 
in a Poisson process. From Eqs. (9.40) and (9.41) we see that the transition probabilities are as 
follows: 
1 
P[ X(t) = a| X(0) =a] = zt + @ 2} 
1 
P[ X(t) = a| X(0) = b] = zU —e 7) — ifa # b. 


Thus the transition probability matrix is 


ats U21 + eh 1/241 — e att 
(1) = V2{1 = eP} 1/2f1 + ee) YP 
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11.4.1 State Occupancy Times 


Since the random telegraph signal changes polarity with each occurrence of an event in 
a Poisson process, it follows that the time spent in each state is an exponential random 
variable. It turns out that this is a property of the state occupancy time for all continu- 
ous-time Markov chains, that is: X(t) remains at a given value (state) for an exponential- 
ly distributed random time. To see why, let T; be the time spent in a state i. The 
probability of spending more than ¢ seconds in this state is then 


PT; > t]. 


Now suppose that the process has already been in state i for s seconds; then the proba- 
bility of spending t more seconds in this state is 


PIT; >t + s|T, > s] = P[T, > t+ s|X(s') =4085' =s], 


since the {7; > s} implies that the system has been in state i during the time interval 
(0, s). The Markov property implies that if X(s) = i, then the past is irrelevant and we 
can view the system as being restarted in state i at time s: 


PIT; > t + s|T; > s] = PIT, >t]. (11.32) 


Only the exponential random variable satisfies this memoryless property (see 
Section 4.4). Thus the time spent in state į is an exponential random variable with 
some mean 1/2;: 


PIT; > t] =e. (11.33) 


The mean state occupancy time 1/v; will usually be different for each state. 

The above result provides us with another way of looking at continuous-time 
Markov chains. Each time a state, say i, is entered, an exponentially distributed state occu- 
pancy time T; is selected. When the time is up, the next state j is selected according to a 
discrete-time Markov chain, with transition probabilities ¢;;. Then the new state occupancy 
time is selected according to 7;, and so on.? We call Gj j an embedded Markov chain. We will 
see in the last part of this section that the properties of the continuous-time Markov chain 
depends on the class properties of its embedded chain. 


Example 11.35 


The random telegraph signal in Example 11.34 spends an exponentially distributed time with 
mean 1/q in each state. When a transition occurs, the transition is always from the present state 
to the only other state, thus the embedded Markov chain is 


oo =0 Jor =1 
qo =1 qu = 0. 


ŝThis view of Markov chains is useful in setting up computer simulation models of Markov chain processes. 
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11.4.2 Transition Rates and Time-Dependent State Probabilities 


Consider the transition probabilities in a very short time interval of duration 6 seconds. 
The probability that the process remains in state i during the interval is 


PES 8] = e™’? 


v ve 
=i + 

1! 2! 
= 1 — v6 + o(ô), 


where o(ô) denotes terms that become negligible relative to 6 as ô approaches zero. 
The exponential distributions of the state occupancy times imply that the probability 
of two or more transitions in an interval of duration ô is o(8). Thus for small 5, p;;(5) is 
approximately equal to the probability that the process remains in state i for 6 seconds: 


pil 8) = P[T; > ô] + o(8) 
= 1 — v7 + o(ô) 
or equivalently, 
1 — pu(ô) = v6 + o(ô). (11.34) 


We call v; the rate at which the process X(t) leaves state i. 
Once the process leaves state i, it will enter state j with probability q,;, where qj; 
is the transition probability of the embedded Markov chain. Thus 


pij(5) =(= pul) qi; 
= viqij5 + 0(8) 
= yi5 + 0(8). (11.35a) 


We call y; = viqi; the rate at which the process X(t) enters state j from state i. For com- 
pleteness, we define y; = —v;, so that by Eq. (11.34), 


pal) — 1 = y + o(ô). (11.35b) 
If we divide both sides of Eqs. (11.35a) and (11.35b) by 6 and take the limit 6 — 0, 
we obtain (8) 
. Pij 6 B ; y 
lim s T5 i+ j (11.36a) 
and 
_ pu(ô)—1 = 
lim 5 = Yü, (11.36b) 
since 
. O(d) _ 
ee 


because 0(6) is of order higher than ô. 


4A function g(h) is said to be o(h) if lim, og(h)/h = 0, that is, g(h) goes to zero faster than h does. 
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X(t + 8) 


~ 
~ 

+ 
% 


FIGURE 11.10 
Transitions into state j. 


We are now ready to develop a set of equations for finding the state probabilities 
at time t, which will be denoted by 


p(t) = P[X(t) = j). 
For 6 > 0, we have (see Fig. 11.10) 
pt + 8) = PLX(t + 8) = j] 
eae (t + 8) = j| X(t) = iJP[X(¢) = i] 
If we subtract p,(t) from both sides, we obtain 
pj(t + ô) — pit = X pil) + (p;(8) — 1)p;(t). (11.38) 


If we divide by ô, apply Eqs. (11.36a) and (11.36b) and let 6 — 0, we obtain 


‘= X v;p:(t). (11.39) 


Equation (11.39) is a form of the Chapman-Kolmogorov equations for continuous- 
time Markov chains. To find p;(t) we need to solve this system of differential equations 
with initial conditions specified by the initial state pmf {p,(0), j = 0,1,... }. 

Note that if we solve Eq. (11.39) under the assumption that the state at time zero 
was i, that is, with initial condition p,(0) = 1 and p,(0) = 0 for all j # i, then the solu- 
tion is actually p;;(t), the ij component of P(t). Thus Eq. (11.39) can also be used to find 
the transition probability matrix. 
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Example 11.36 A Simple Queueing System 


A queueing system alternates between two states. In state 0, the system is idle and waiting for a 
customer to arrive. This idle time is an exponential random variable with mean 1/a. In state 1, the 
system is busy servicing a customer. The time in the busy state is an exponential random variable 
with mean 1/8. Find the state probabilities po(t) and p;(t) in terms of the initial state probabilities 


po(0) and p;(0). 
The system moves from state 0 to state 1 at a rate a, and from state 1 to state 0 ata 
rate B: 


Yoo = ~@ Yor ~ @& 
Yio = B yu = —B. 
Equation (11.39) then gives 
Po(t) = —apo(t) + Bpi(t) 
Pi(t) = apo(t) — Bpilt). 
Since po(t) + p(t) = 1, the first equation becomes 
Polt) = —apo(t) + BU — po(t)), 


which is a first-order differential equation: 


Polt) + (a + B)po(t) = B Po(9) = po. 


The general solution of this equation is 


e 2 
= 4 (a+B)t. 
Po(t) aks Ce 


We obtain C by setting t = 0 and solving in terms of po(0); then we find 


Pot) = £ + (p10) a Jenn 


+ B a+ Bp 
and 
rile) = 2 + (o )- 25) FATE 
Note that as t > œ, 
Wi) and p(t) >= r a 


Thus as t — ©, the state probabilities approach constant values that are independent of the ini- 
tial state probabilities. 


Example 11.37 The Poisson Process 


Find the state probabilities for the Poisson process. 
The Poisson process moves only from state i to state i + 1 at a rate a. 
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Thus 
Yi = ~a and Vi,it1 7 Q. 


Equation (11.39) then gives 
polt) = —apo(t) forj=0 
Dit) = —ap;(t) + ap;-1(t) forj = 1. 


The initial condition for the Poisson process is pọ(0) = 1, so the solution for the j = 0 equation is 


Pot) =e 
The equation for j = 1 is 


P\(t) = —api(t) +ae“ —p,(0) = 0, 


which is also a first-order differential equation for which the solution is 


— at mat 
p(t) = ne 
It can be shown by an induction argument that the solution of the state j equation is 
(at) a 
p(t) = i! s 


For any fixed time #, the sum of { p;(t)} is one. Note however, that for any j, p;(t) > 0 as t > œ. 
Figure 11.11 shows how the pmf drifts to higher values as time progresses. Thus for the Poisson 
process, the probability of any finite state approaches zero as t > œ. This is consistent with the 
fact that the process grows steadily with time. 


P(t) = ate% 


FIGURE 11.11 
State pmf of Poisson process vs. time. 
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11.4.3 Steady State Probabilities and Global Balance Equations 


As t — œ, the state probabilities in the two-state queueing system in Example 11.36 
converge to a pmf that does not depend on the initial conditions. This is typical of 
systems that reach “equilibrium” or “steady state.” For such a system, p;(t) —> pj and 
pj (t) > 0, so Eq. (11.39) becomes 


0= X y;jpi forall j, (11.40a) 
i 
or equivalently, recalling that y;; = —v;, 
iF] 
where 
p= 1. (11.40c) 
j 


Equation (11.40b) can be rewritten as follows: 


of Sr) = > YijPi (11.40d) 
ižj ižj 
since 
vj = Zve 
i*j 

The system of linear equations given by Eq. (11.40b) or (11.40d) are called the global 
balance equations. These equations state that at equilibrium, the rate of probability 
flow out of state j, namely v,p;, is equal to the rate of flow into state j, as shown in 
Fig. 11.12. By solving this set of linear equations we can obtain the stationary state pmf 
of the system (when it exists).° 

We refer to p = {p;} as the stationary state pmf of the Markov chain. Since p sat- 
isfies Eq. (11.39), if we start the Markov chain with initial state pmf given by p, then the 
state probabilities will be 


Dit) = Pi for all t. 


FIGURE 11.12 
Global balance of probability flows. 


5The last part of this section discusses conditions under which the stationary pmf exists. 
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The resulting process is a stationary random process as defined in Section 9.6 since the 
probability of the sequence of states ip, i,,...,i, at times t < t4 +t <- < t, + Fis, 
by Eq. (11.29), 

PL X(t) = io, X(t + t) = ,..., X(t) +t) = iy] 
P[X (th + t) = in| Xia + t) = in] 
x PIX(t + t) = hl X(t) = ig |PLX(t) = io]. 


The transition probabilities depend only on the difference between the associated times. 
Thus the above joint probability depends on the choice of origin only through 
P| X(t) = ip]. But PLX(t) = io] = p; for all t. Therefore we conclude that the above joint 
probability is independent of the choice of time origin and thus that the process is stationary. 


Example 11.38 


Find the stationary state pmf for the two-state queueing system discussed in Example 11.36. 
Equation (11.40b) for this system gives 


apy = Epi and BP = apo. 


Noting that pọ + pı = 1, we obtain 


Example 11.39 The M/M/1 Single-Server Queueing System 


Consider a queueing system in which customers are served one at a time in order of arrival. The 
time between customer arrivals is exponentially distributed with rate A, and the time required to 
service a customer is exponentially distributed with rate u. Find the steady state pmf for the 
number of customers in the system. 

The state transition rates are as follows. Customers arrive at a rate A, so 


Yi, 1 = A i=0,1,2,.... 
When the system is nonempty, customers depart at the rate u. Thus 
Yi i-1 = H i = 1,2,3,.... 


The transition rate diagram is shown in Fig. 11.13. The global balance equations are 


À À À À À À À 
LI AA 
i a E 

m m u u H m u 


FIGURE 11.13 
Transition rate diagram for M/M/1 queueing system. 
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Apo = upi for j = (11.41a) 


We can rewrite Eq. (11.41b) as follows: 


Api — MPj+1 = APj-1— ep; ~— forj = 1,2,..., 


which implies that 
Apj-1 — apj = constant for j = 1,2,.... (11.42) 


Equation (11.42) with j = 1 and Eq. (11.41a) together imply that 
constant = Apo — mp, = 0. 


Thus Eq. (11.42) becomes 
APj-1 = BPj> 


or equivalently, 
Pi = PPja J = 1,2,... 


and by a simple induction argument 
Pj = P Po, 
where p = A/p. We obtain po by noting that the sum of the probabilities must be one: 


& 1 
1= $p = (1+ p+ p +) = 7P 
fi p 


where the series converges if and only if p < 1. 
Thus 


pS Up FSO es (11.43) 
This queueing system is discussed in detail in Section 12.3. 


The condition for the existence of a steady state solution has a simple explanation. The 
condition p < 1 is equivalent to 


A< Bp, 


that is, the rate at which customers arrive must be less than the rate at which the system can 
process them. Otherwise the queue builds up without limit as time progresses. 


Example 11.40 A Birth-and-Death Process 


A birth-and-death process is a Markov chain in which only transitions between adjacent states 
occur as shown in Fig. 11.14. The single-server queueing system discussed in Example 11.39 is an 
example of a birth-and-death process. 

The global balance equations for a general birth-and-death process are 


AoPo = Mpi j=0 (11.44a) 


AjPj T Mj+HiPj+1 = Aj-1Pj-1 T MiP) =F = 1,2,.... (11.44b) 


11.4.4 
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No dj dy X Ap Aj pe 
omomomo 
My M2 M3 Ma Bj HMj+1 Mj+2 
FIGURE 11.14 
Transition rate diagram for general birth-and-death process. 
As in the previous example, it then follows that 
Pj = TiPj-1 j=1,2,... 
and 
Pj = ti-i enp J=1,2,..., (11.45) 


where r; = (Aj-1)/p;. If we define 
R =f and Ry = 1, 


then po is found from 


1= (Zx) 


If the series in the above equation converges, then the stationary pmf is given by 


Ds (11.46) 


If the series does not converge, then a stationary pmf does not exist, and p; = 0 for all j. In 
Chapter 12, we will see that many useful queueing systems can be modeled by birth-and-death 
processes. 


Limiting Probabilities for Continuous-Time Markov Chains 


We saw above that a continuous-time Markov chain X(t) can be viewed as consisting of 
a sequence of states determined by some discrete-time Markov chain X,, with transi- 
tion probabilities ¢;; and a corresponding sequence of exponentially distributed state 
occupancy times. In this section we use this approach to investigate the limiting proba- 
bilities of continuous-time Markov chains. 

First we consider the construction of stationary solutions for X(t) from the steady 
state solutions of X,,. Suppose that the embedded Markov chain X, is irreducible 
and positive recurrent, so that Eq. (11.25) holds. Let N;(n) denote the number of 
times state i occurs in the first n transitions, and let T;(j) denote the occupancy time 
the jth time state i occurs. The proportion of time spent by X(f) in state i after the 
first n transitions is 
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Ni(n) : 
. . > Tl) 
time spent in state i A 
time spent in all states = _, Ns) 
YD A) 
i j=l 
Nin) 1 Nil) 
T. 
oe A = (11.47) 
5 N;(n) 1 er S ` 
i n N;(n) j=1 n 
As n > œ, by Eqs. (11.25) and (11.26ab), with probability one, 
N;(n 
l een (11.48) 
n 


the stationary pmf of the embedded Markov chain. In addition, we also have that 
N,(n) > œ asn— œ, so that by the strong law of large numbers, with probability one, 


1 
N;(n) 


N;(n) 
> TO) > EIT] = 1/v;, (11.49) 
j=l 


where we have used the fact that the state occupancy time in state i has mean 1/2;. 
Similarly the denominator in Eq. (11.47) must approach (r vj). Equations (11.48) 


and (11.49) when applied to Eq. (11.47) imply that if X; 7j/v; < œ, with probability 
one, the long-term proportion of time spent in state i approaches 


T/V; 
Pi = = crf U;, (11.50) 


j 
where 77; is the unique pmf solution to 


and c is a normalization constant. 
We obtain the global balance equation, Eq. (11.40b), by substituting 7; = v;p;/c 
from Eq. (11.50) and q;; = y;;/v; into Eq. (11.51): 


viPj = X pri for all j. 


Ej 


Thus the p,’s are the unique solution of the global balance equations. 
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We have proved the following result: 


Theorem 3 


Assume a time-continuous Markov chain, for which the embedded Markov chain is irreducible 
and positive recurrent with stationary pmf {7,} and Dt i/v; < œ, then the following asser- 
tions hold: 


@ limp;(t) =p; forall j; 


(ii) The solution {p;} is unique and satisfies Eqs. (11.40bc); 


(iii) For each j, pj is the long-term proportion of time spent in state j. 


Now assume that we know that the Markov chain is irreducible and that we have 
a solution {p;} to the global balance equations (11.40bc): 


= X piyi 


iF] 


Substituting Eq. (11.50) into the above equation 


CT; s Vij 
J CTi 2 ~ 
CT) =— Vj = > Vij =e > Ti —] =c > Tidqij 
Vj izj \ Vi iFj Vj iFj 


implies that the following choice of {7;} gives a solution for the stationary pmf of the 
embedded Markov chain: 


Pj; 


> Pivi 
L 


Ti 


Note that we must require that the denominator be finite. From Theorem 1 in Section 11.4, 
if there is a stationary pmf then it is unique and positive recurrent. Furthermore the 
construction of {7;} from the {p;} ensures that pj is the long-term proportion of time 
in state j as well as the limiting state probability for X(f). 

We have shown the following theorem: 


Theorem 4 


Assume a time-continuous Markov chain, for which the embedded Markov chain is irreducible. 
Suppose that { p;} is a solution to the global balance equations (11.40bc), and that 27 (Vj < ©, 
then the following assertions hold: 


(i) The solution {p;} is unique; 


(ii) Jim pj(t) = p; for all j; 
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(iii) For each j, p; is the long-term proportion of time spent in state j; 


(iv) The embedded Markov chain is positive recurrent. 


Example 11.41 


In the two-state system in Example 11.36, 


[Gis] = : al 


The equation m = 7[@;;] implies that 


1 

To = Tı = 7 
In addition, v9 = a and v, = B. Thus 

1/2(1/a) B 

Po a+ 1/B) a+ B 
and 
Qa 
pig a+ Bp 


TIME-REVERSED MARKOV CHAINS 


We now consider the random process that results when we play a Markov chain back- 
wards in time. We will see that the resulting process is also a Markov chain and so de- 
velop another method for obtaining the stationary probabilities of the forward and 
reverse processes. The insights gained by looking at the reverse process prove useful in 


developing certain results in queueing theory in Chapter 12. 


Let X, be a stationary ergodic Markov chain® with one-step transition probability 
matrix P = {pj} and stationary state pmf {;}. Consider the dependence of X,„-1, 
the “future” in the reverse process, on X,,, X)+1,---, Xn+x, the “present and past”: 


P| Xn- = j| X, = i, Xn+1 = iy,.-+5 Xn+k = ix] 


2 P| Xn-1 = j, Xn > i, Xn+1 z lyes, AnER E ix] 


P(X, = 1, Xn+1 = Liss ney Anek = ix] 


TP RPI, ++ Pir-i 
TiPii ++ Pig y.ig 
TiPji 


Ti 


= P[X,-1 = j|X, = i]. 


That is, let it be an irreducible, aperiodic, stationary Markov chain. 


(11.52) 
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The above equations show that the time-reversed process is also a Markov chain with 
one-step transition probabilities 
TiPji 


Ti 


P| Xn = j| Xn = i] = qij (11.53) 

Since X, is irreducible and aperiodic, its stationary state probabilities 7; repre- 
sent the proportion of time that the state is in state j. This proportion of time does not 
depend on whether one goes forward or backward in time, so 77; must also be the sta- 
tionary pmf for the reverse process. Thus the forward and reverse process must have the 


same stationary pmf. 


Example 11.42 


Suppose that a new light bulb is put in use at day n = 0, and suppose that each time a light bulb 
fails it is replaced the next day. Let X,, be the age of the light bulb (in days) at the end of day n. 
If a; is the probability that the lifetime L of a light bulb is 7 days, then the probability that the 
light bulb fails on day j given that it has not failed up to then is 


P|L =j a; 
pa lZ : j=1,2,.... 


0 
Xar 
k=j 


Thus the transition probabilities for X,„ are 


Pitt =1-—5b i =1,2,... 
pa = bi i = 1,2,... 


Pj= 0 otherwise. 


Figure 11.15(a) shows the state transition diagram of X,,, and Fig. 11.16(a) shows a typical sam- 
ple function that consists of a sawtooth-shaped function that increases linearly and then falls 
abruptly to one when a light bulb fails. 

Figure 11.16(b) shows a sample function of the reverse process from which we deduce that 
the state transition diagram must be as shown in Fig. 11.15(b). The transition probabilities for the 
reverse process are obtained from Eq. (11.53): 


by b, 
nC G >| 2 =| 3 meee > 
i=}, 1-b, 
(a) 


(b) 


FIGURE 11.15 
(a) Transition diagram for age of a renewal process. (b) Transition diagram for 
time-reversed process. 
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xX 4 4 


n 


atat] t. i hhli. 


(a) (b) 


FIGURE 11.16 
(a) Age of light bulb in use at time n. (b) Time-reversed process of X,. 


Ti—1 i 
4i,i-1 = (1 — bi-1) i= 2,3,4,... 
Ti 
Ti 
qii = — b; i= 1,2, 
T1 
Gj = 9 otherwise. 


For now we defer the problem of finding the stationary state probabilities 77;. 


Example 11.42 shows that Eq. (11.53) provides us with conditions that must be 
satisfied by the stationary probabilities 7;. Suppose we were able to guess a pmf {77} 
so that Eq. (11.53) holds, that is, 


Tiqij = TiPji for all i, j. (11.54) 


It then follows that {7;} is the stationary pmf. To see this, sum Eq. (11.54) over all j, 
then 


X ripi = Ti >, qij = Ti for alli. (11.55) 
j j 


But Eq. (11.55) is the condition for 7; to be the stationary pmf for the forward process, 
thus 77; is the stationary pmf. Equation (11.54) thus provides us with another method 
for finding the stationary pmf of a discrete-time Markov chain: If we can guess a set of 
transition probabilities q; j; for the reverse process and a pmf 7; so that Eq. (11.54) is sat- 
isfied, then it follows that the m; is the stationary pmf for the Markov chain and the qij 
are the transition probabilities for the reverse process. 


Example 11.43 


The sample function of the reverse process in Example 11.42 suggests that fori > 1, the process 
moves from state i to state i — 1 with probability one; that is, 


m—1(1 — bj-1) 
qi; i-1 = = 1, 


Ti 
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which implies that 
(1 — bi-1)Ti-1 i= 2,3,... 
(L = Bea) = dpa) 0 (1 = Byars: (11.56) 


Ti 


However, from Example 11.42 for i = 2, 


oo 
Da 
k=i 


(1 — b-1) = 1 7 l 


oo > 


Da Èa 


i- 
k=i-1 k=i-1 


so in Eq. (11.56), the denominator of (1 — b;_;) cancels the numerator of (1 — b,_), the de- 
nominator of (1 — b;_7) cancels the numerator of (1 — b;_3), and so on. Thus 


We obtain m; by using the fact that the probabilities sum to one: 
1 = m > P[L = i] = m4 EL}, 
i=l 


where we have used Eq. (4.29) for E[L]. Thus 


ee f= 1,2 (11.57) 
EL] i Visas 4 j 


Ti 


Time-Reversible Markov Chains 


A stationary ergodic Markov chain is said to be reversible if the one-step transition 
probability matrix of the forward and reverse processes are the same, that is, if 


dij = Pij for all 1, j. (11.58) 

Equations (11.53) and (11.58) together imply that a Markov chain is reversible if and 
only if 

TiPij = T;Pji for all 1, j. (11.59) 

Since 7; and m; are the long-term proportion of transitions out of states i and j, respec- 


tively, Eq. (11.59) implies that a chain is reversible if the proportion of transitions from 
i to j is equal to the proportion of transitions from j to i. 


Example 11.44 Discrete-Time Birth-and-Death Process 


Figure 11.17 shows the state transition diagram for a discrete-time birth-and-death process with 
transition probabilities 


Poo = 0 Po =1= a 


Piiri = ai Pii-1 = L= a; L= T2 
Pj= 0 otherwise. 
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1 


a 
a — 
omomomna DRE 


l-a, 1-a 1— aj, 


FIGURE 11.17 
Transition diagram for a discrete-time birth-and-death process. 


For any sample path, the number of transitions from i to i + 1 can differ by at most 1 from the 
number of transitions from i + 1 toi since the only way to return to 7 is through i + 1. Thus the 
long-term proportion of transitions from i to į + 1 is equal to that from į + 1 to i. Since these are 
the only possible transitions, it follows that birth-and-death processes are reversible. 

Equation (11.59) implies that 


; = (1 = aj+1)Tj+1 J= 0,1,2,..., 


which allows us to write all the 7;’s in terms of 779: 


fl ao 7 aj- ** Ap Ea a 

“j 1- aj T= äi To (1—a)--(— a)" i770: ( : ) 

The probability 79 is found from 1= T>, R;. (11.61) 
j=0 


The series in Eq. (11.61) must converge in order for 7; to exist. 


11.5.2 Time-Reversible Continuous-Time Markov Chains 


Now consider a stationary, continuous-time Markov chain played backward in time. If 
X(t) = i (ie., the process is in state i at time t), then the probability that the reverse 
process remains in state i for an additional s seconds is 


P| X(t — s) =i,T; > s] 


PAG) =i. f=s SP? <t|X(t) =i] = 


P| X(t — s) =i|P[T; > s] 


= PIT, > s] =e, (11.62) 


where P|X(t — s) =i] = P[X(t) =i] because X(t) is a stationary process, and 
where 7; is the time spent in state i for the forward process. Thus the reverse process 
also spends an exponentially distributed amount of time with rate v; in state i. 

The jumps in the forward process X(t) are determined by the embedded Markov 
chain q;;, so the jumps in the reverse process are determined by the discrete-time Markov 
chain corresponding to the time-reversed embedded Markov chain given by Eq. (11.53): 

TÀ 
qij = jii (11.63) 


Ti 
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It follows that the transition rates for the time-reversed continuous-time process are 
given by 


; TVi ji 
Vij UGS =e 

= J? iJ (11.64) 
TV; Pi 


where we used the fact that qj, = y;;/v; and p; = c7;/v;. In comparing Eq. (11.64) to 
Eq. (11.53), note that the transition rates y;; have simply replaced the transition proba- 
bilities q;j in going from the discrete-time to the continuous-time case. 

The discussion that led to Eq. (11.54) provides us with another method for deter- 
mining the stationary pmf p; of X(t). If we can guess a set of transition rates y; j and a 
pmf p; such that 


PiYij = PiYji for all i, j (11.65a) 
and 

X Yi = Wi for all i, (11.65b) 

jzi ` jw 


then p; is the stationary pmf for X(t) and y; ; are the transition rates for the reverse process. 

Since the state occupancy times in the forward and reverse processes are expo- 
nential random variables with the same mean, the continuous-time Markov chain X(t) 
is reversible if and only if its embedded Markov chain is reversible. Equation (11.59) 
implies that the following condition must be satisfied: 


TiGi = 7j9ji for all i, j, (11.66) 


where 77; is the stationary pmf of the embedded Markov chain. Recall from Eq. (11.50) that 
Tj = cv;p;, where p;is the stationary pmf of X(t). Substituting into Eq. (11.66), we obtain 


Pilij = PWA» 
which is equivalent to 
PiYij = PjYji- (11.67) 
Thus we conclude that X(t) is reversible if and only if Eq. (11.67) is satisfied. As in the 


discrete-time case, Eq. (11.67) can be interpreted as stating that the rate at which X(t) 
goes from state i to state j is equal to the rate at which X(t) goes from state j to state i. 


Example 11.45 Continuous-Time Birth-and-Death Process 


Consider the general continuous-time birth-and-death process introduced in Example 11.40. The 
embedded Markov chain in this process is a discrete-time birth-and-death process of the type 
discussed in Example 11.44. It therefore follows that all continuous-time birth-and-death 
processes are time-reversible. 
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In Chapter 12 we will see that the time reversibility of certain Markov chains im- 
plies some remarkable properties about the departure processes of queueing systems. 


NUMERICAL TECHNIQUES FOR MARKOV CHAINS 


In this section we present several numerical techniques that are useful in the analysis of 
Markov chains. The first part of the section presents methods for finding the stationary as 
well as transient solutions for the state probabilities of Markov chains. The second part of 
the section addresses the simulation of discrete-time and continuous-time Markov chains. 


Stationary Probabilities of Markov Chains 


The most basic calculation with finite-state discrete-time Markov chains involves find- 
ing their stationary state probabilities. To do so, we consider the equation: 


a= mP orequivalently 0 = a(P — I). (11. 68a) 


In general the above set of linear equations is undetermined. To see this, note that the 
sum of the columns of the matrix P — I is zero. Therefore we need the normalization 
equation: 7; + 72+ ++: + mg = 1. We can incorporate this equation by replacing one 
of the columns of P — J with the all 1’s column vector. Let Q be the matrix that results 
when we replace the first column of P — J; the system of linear equations becomes: 


b = 7, (11. 68b) 


where b is a row vector with 1 in the first entry and zeros elsewhere. If the Markov 
chain is irreducible, then a unique stationary pmf exists and is obtained by inverting the 
above equation. 


Example 11.46 Google PageRank 


Find the stationary pmf for the PageRank algorithm in Example 11.30. 
After we take P — I from the example and replace the first column with all 1’s we obtain: 


1 0.4550 0.4550 0.0300 0.0300 
1 —0.8000 0.2000 0.2000 0.2000 
Q=|1 0.3133 -0.9700 0.3133 0.0300 |. 
1 0.0300 0.0300 —0.9700 0.8800 
1 0.0300 0.4550 0.0300 —0.5450 


We then invert Q to obtain the pmf: 
m = (0.13175, 0.18772, 0.24642, 0.13172, 0.30239). 


The Octave commands for the above procedure are given below: 


> Q=[1 0.455 0.455 0.03 0.03 
Be E poe 2* 2 

> 1 0.3133 —.97 0.3133 0.03 
> 1 0.03 0.03 —0.97 0.88 


11.6.2 


Section 11.6 Numerical Techniques for Markov Chains 693 
1 0.03 0.455 0.03 —.545]; 

b=[1 0 0 0 0]; 

p=b* inv (Q) 


SO v v v 


0.13175 0.18772 0.24642 0.13172 0.30239 


In the case of infinite-state Markov chains, we can apply matrix inversion by trun- 
cating the state space at some value where the state probabilities become negligible. 
Another method, discussed in the next chapter, involves the application of the proba- 
bility generating function for the state of the system. 

To find the stationary pmf for finite-state continuous-time Markov chains, we 
need to find a pmf that satisfies Eq. (11.40a) as well as the normalization condition: 


0=pl and 1= pe (11.69a) 
where 
= Vvo Yor Yo02 Yo03 
r=| %™ ee YIK-1 | and e= (11.69b) 
YK-10 YK-11 ~UK-1 1 


The columns of I sum to zero, so as before we need to replace a column of T with e. We 
obtain p by multiplying b by the inverse of the resulting matrix. 


Example 11.47 Cartridge Inventory 


An office orders laser printer cartridges in batches of four cartridges. Suppose that each car- 
tridge lasts for an exponentially distributed time with mean 1 month. Assume that a new batch of 
four cartridges becomes available as soon as the last cartridge in a batch runs out. Find the sta- 
tionary pmf for N(t), the number of cartridges available at time t. 

N(t) takes on values from the set {1, 2, 3,4} and follows a periodic sequence of values 
4—3 —2— 1> 4....The rate out of each state is 1 and the rate into each state from the previous 
state is also 1. Therefore the transition rate matrix and the modified global balance equations are: 


-1 0 0 1 [1 0 0 1 

1 -1 0 0 1 -1 0 0 

T5 o a 1 0 b=P i i a o 
[o 0 1 1 1 0 1 -1 


It is easy to show that the p = (1/4, 1/4, 1/4, 1/4). In a more complicated case we would use nu- 
merical inversion to solve for p. 


Time-Dependent Probabilities of Markov Chains 


We now consider finding the time-dependent probabilities of a finite-state discrete-time 
Markov chain as given by Eq. (8.16b). Example 11.9 described the general approach for 
finding P”. First, however, we note a few facts about the transition probability matrix P. 
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A stochastic matrix is defined as a nonnegative matrix for which the elements of 
each row add to one. Thus P is a stochastic matrix. A stochastic matrix always has 
à = 1 as an eigenvalue and ef = (1,..., 1) as a right eigenvector: le = Pe. This fol- 
lows from the fact that all the row elements of P add to one. On the other hand, the sta- 
tionary pmf ~ is a left eigenvector for the A = 1 eigenvalue of P: 1m = mP. It can be 
shown [Gallager, pp. 116-117] that if P corresponds to an aperiodic irreducible 
Markov chain, then A = 1 is the largest eigenvalue and the magnitude of all other 
eigenvalues are less than 1. 

Let P correspond to an aperiodic irreducible Markov chain. Proceeding as in 
Example 11.19, to find P” we first find the eigenvalues 1 = A, > |a| > ... > |Ax| 
and right eigenvectors of P: e1, e€,..., ex. Letting E be the matrix with eigenvectors as 
columns, we then have that: 


P” = EAE! 
1 0 .. 0 
O V ae 0 
=E E. 11.70 
0 0 0 ( ) 
0 0 A% 


Note how all but the 1-1 entry in the diagonal matrix approach zero as n increases. 
Note as well that the first column of E is the all 1’s vector. This implies that the first 
row of E™ contains the stationary pmf m. In Octave the eigenvalues and eigenvec- 
tors of P are obtained using the eig(P) function, which was discussed previously in 
Section 10.7. In practice it is simpler and more convenient to use the command P“n. 
Next we consider finding the time-dependent probabilities of a finite-state 
continuous-time Markov chain that are the solution to Eq. (11.39): 


K 
P’(t) = [pi] = > Pit) Yi = p(t) subject to p(0) = (pi(0),.--, px(0)). 11.71) 


We are now dealing with first-order vector differential equations. Electrical engineer- 
ing students encounter this equation in an introductory linear systems course. The so- 
lution is given by: 


p(t) = p(0)P(t) = p(0)e™ (11.72a) 


where P(t) = el" is the matrix of transition probabilities in an interval of length t sec- 
onds, and where the exponential matrix function is defined by: 
Sa 
PQ) = osea (11.72b) 
j=0 J: 


Furthermore, using matrix diagonalization the exponential matrix can be evaluated as: 
P(t) = Efe“JE! (11.72c) 


where E is a matrix whose columns are the eigenvectors of I and the middle matrix is 
a diagonal matrix with exponential functions as its elements. [Gallager, p. 194] shows 
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that if the Markov chain is finite state and irreducible, then I has an eigenvalue A = 0 
which has right eigenvector e” = (1, 1,...,1). also has a left eigenvector p corre- 
sponding to A = 0 which is the unique stationary state pmf. Furthermore the remain- 
ing eigenvalues of [ have negative real parts. This implies that all but the A = 0 
exponential terms in the diagonal matrix decay to zero as t increases. If we let A = 0 oc- 
cupy the 1-1 entry in the diagonal matrix, then as t —> œ, P(t) approaches the product of 
the e and the first row of E™. 


Example 11.48 Cartridge Inventory 


Find the state probabilities for N(¢) in Example 11.47 if N(0) = 4. 
We use the eig(I) function to obtain the eigenvalues and eigenvectors of I and the asso- 
ciated matrices, E, A, and E™!: 


[te ae 0 0 0 0 1 1 1 1 
1j1 go -j -1 0 -1-j 0 0 UE -j =I 
aia J J A= J E=- J J , 
a aL” 0 0 -1+j7 O 2)1 gj -1 -j 
la Spey sal [0 0 0 =2 Ph Sk E =I 
Note that two of the eigenvalues and their corresponding eigenvectors are complex. The state 
probabilities are given by: 
[1 0 0 0 
0 eti) 0 7 
P(t) = POO)E) og ttt ET 
| 0 0 0 e™ 
[1 1 1 ıı o0 0 o |1 1 1 1 
1 j -j -1|o0 e) 0 o |1 -j -1 j 
= (0,0, 0, 1)— ; 
002) 1 -1 -1 1 |o 0 efi o lt j -1 -j 
[1 =j j -1]L0 0 0 e~|l1 -1 1 -1 
> St 1 1 1 
1 oa e (tit jeri —e (tie je (tet 
gL D| -a-p jt Lette jer tt 
| e% e% et et 
1 
= qt — 2e” sin t — e ™,1 — 2e™ cost + e™,1 + 2e™ sint — e 7,1 + 2e™ cost + e”). 
Figure 11.18 shows the four state probabilities vs. time. It can be seen that all of the probability 
mass is initially in state 4 and that the mass first transfers to state 3, then state 2, and finally to 
state 1. Eventually all state probabilities approach the steady state value of 1/4. 
Simulation of Markov Chains 


We simulate a Markov chain by emulating its underlying random experiments. We 
begin by selecting the initial state according to an initial state pmf. We then generate 
the sequence of states by producing outcomes according to the associated transition 
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FIGURE 11.18 
Time-dependent probabilities in cartridge inventory. 


probabilities. In the case of continuous-time Markov chains we also need to generate a 
state occupancy time after each state transition has been determined. Figure 11.19 
shows the inputs and outputs of generic modules for generating realizations of a 
Markov chain. 


Discrete-Time Markov Chains The module for generating a sequence of states for a 
Markov chain requires the following inputs: i. The state space; ii. The matrix of state 
transition probabilities; iii. The initial state probability mass function; and iv. The num- 
ber of steps in the simulation sequence. The module operates as follows: 


1. Generates the initial state according to 77. 


2. Repeatedly generates the next state according to the transition probabilities of 
the current state. 


3. Stops when the required number of steps has been simulated. 


Number of 
S P mo steps S 


Li 4 I L i } I 


Discrete-time 
Markov chain 
simulator 


FIGURE 11.19 


Continuous-time 
Markov chain 
simulator 


Generic modules for simulating Markov chains. 
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Example 11.49 Discrete-Time Markov Chain 


Develop a program to generate Markov chains with the state transition diagram as shown 
in Fig. 11.20(a). Note that the Markov chain is similar to that of a birth-death process except that 
transitions from a state to itself are allowed. Use the program to simulate 1000 time steps in a 
data multiplexer where in each time unit a packet is received with probability a, and/or a packet 
transmitted from its buffer with probability b. Assume the data multiplexer is initially empty. 

For this example we wrote the function Discrete_MC (Nmax,P,IC,L). The state space is 
{0, 1,..., Nmax}. Since Octave uses indices from 1 onwards, the array state ranges from 1 to 
Nmax + 1. For the Markov chains under consideration we need to specify only three probabili- 
ties for the transition probabilities for each state. Therefore P is an Nmax + 1 row by 3 column 
matrix. The initial state pmf is a Nmax + 1 by 1 vector. The output of the function is a vector of 
states of size L. 

The Markov chain for the data multiplexer has the following transition probabilities. If 
N = 0, that is, the system is empty, the next state is either N = 1 with probability a, or N = 0 
with probability 1 — a, that is: po = 1 — a, po, = a. If N = n > Q, the next state is n + 1 with 
probability (1 — b)a;n with probability ab; or n — 1 with probability b(1 — a), that is: 
Pnn+1 = (1 — b)a, Pan = ab, Ppt yn = (1 — a)b. If N = Nmax, the next state is Nay — 1 with 
probability (1 — a)b; or Nmax With probability 1 — b(1 — a), since the system is not allowed to 
grow beyond Nmax- 

The code below prepares the inputs and then calls the function Discrete_MC(S, P,IC,N). 
The basic step in the function involves generating a discrete random variable that determines 
whether the chain increases by 1, decreases by 1, or remains the same. 


Nmax=50; 
P=zeros (Nmax+1,3) ; 
a=0.45; 
b=0.50; 
P(1,:)=[0,1-a,a]; 
r=[(1-a) *b,a*b+(1—a) * (1-b); (1-b) *a]; 
for n=2:Nmax; 
P(n,:)="r; 


end 


PN Noa p 
Z Yn N maxN max 
PN pax Nnax = 1 
Xo Ay Ay A3 Nac} 
j i : i a 
Hı H2 H3 H4 PN nis 


FIGURE 11.20 
Generic Markov chains: (a) discrete-time; (b) birth-death continuous-time. 
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FIGURE 11.21 
(a) Simulation of discrete-time data multiplexer; (b) histogram of number of packets in data multiplexer. 


P(Nmax+1,:)=[(1—-a) *b,1—(1-a) *b, 0]; 
IC=zeros (Nmax+1,1) ; 

IC (1,1)=1; 

L=1000 

Seq=Discrete_MC (Nmax,P,IC,L); 

plot (Seq-1) 


function stseq = Discrete_MC (Nmax,P,IC,L) 

stseq=zeros(1,L); 

s=[1:Nmax+1]; 

step=[—-1,0,1]; 

InitSt=discrete_rnd(1,s,IC); 

stseq(1)=InitSt; 

for n=2:L+1; 
nextst=stseq(n—-1)+discrete_rnd(1,step,P(stseq(n-1),:)); 
stseq(n)=nextst; 

end 


Figure 11.21(a) shows a graph of a 1000-step realization of the Markov chain. The para- 
meters in the simulation are a = 0.45 and b = 0.5. The latter parameter implies that a packet re- 
quires two time units on average of service before it departs the system. During the two time 
units that it takes to service the above packet,2 x (0.45) = 0.9 packets arrive on average. This is 
an example of a “heavy traffic” situation which is characterized by the sporadic but sustained 
buildups of packets seen in the simulation. Figure 11.21(b) shows the histogram of the state oc- 
currences in the simulation. It can be seen that the probability mass is concentrated at the lower 
state values. 


Continuous-Time Markov Chains The module for generating a sequence of states for 
a continuous-time Markov chain requires the following inputs: i. The state space; ii. The 
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matrix of state transition rates; iii. The initial state probability mass function; and iv. 
The duration of the simulation. The module operates as follows: 


1. Generates the initial state according to 77. 


2. Repeatedly generates the next state using the transition probabilities from the 
current state, and the state occupancy times for the new state. 


3. Stops when the elapsed time has been simulated. 


Example 11.50 Continuous-Time Birth-Death Process 


Develop a program to generate continuous-time Markov chains with the state transition dia- 
gram shown in Figure 11.20(b). Use the program to simulate 1000 seconds of an M/M/1 queueing 
system. Assume the system is initially empty. 

For this example we wrote the function Continuous_MC (S,G, IC, T) , given below. The mod- 
ule uses the embedded Markov chain approach and sequentially generates next state and occu- 
pancy time pairs. The transition probabilities for the embedded Markov chain are 
{Gjj-1 = M/A; + uj), Fier = Aj/(Aj + wj)} and the mean occupancy times are exponential 
random variables with mean {1/(A; + «;)}. The basic step involves generating a binary random 
variable that determines whether the chain increases or decreases by 1, and then determines the 
occupancy time in the resulting state. 


function [stseq,OccTime,n] = Continuous_MC (Nmax,G,IC,T) 
Taggr=—1; 
L=T* (G(Nmax—1,1)+G(Nmax—1,2)); % Estimate max number of state transitions. 


stseq=zeros(1,L); 

OccTime=zeros(1,L); 

Q=zeros (1,2); 

s=[1:Nmax+1]; 

step=[—-1,1]; 

InitSt=discrete_rnd(1,s,IC); 

stseq(1)=InitSt; 

n=1; 

OccTime (n) =exponential_rnd(G(stseq(n),1)+G(stseq(n) ,2)); 

Taggr=OccTime (n); 

while (Taggr < T); 
n=n+1; 
Q(stseq(n—-1),:)=[G(stseq(n—-1) ,1),G(stseq(n—-1) ,2)]/(G(stseq(n— 

1),1)+G(stseq(n-1),2)); 
nextst=stseq(n—-1)+discrete_rnd(1,step,Q(stseq(n-1),:)); 
stseq(n)=nextst; 
OccTime (n) =exponential_rnd((G(stseq(n) ,1)+G(stseq(n),2))); 
Taggr=Taggr+OccTime (n); 

End 


Figure 11.22 shows a graph of a realization of the Markov chain. The simulated queueing 
system has an arrival rate of A = 0.9 jobs/second and a mean job service time of u = 1 second. 
Therefore the system is operating in heavy traffic and experiences surges in job backlogs. The 
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FIGURE 11.22 
Simulation of M/M/1 continuous-time Markov chain. 


calculation of the proportion of time that the system spends in each state is more complicated 
than for discrete-time systems because the occupancy times must be taken into account. These 
calculations will be addressed in the next chapter. 


SUMMARY 


e A random process is said to be Markov if the future of the process, given the pre- 
sent, is independent of the past. 


e A Markov chain is an integer-valued Markov process. 


e The joint pmf for a Markov chain at several time instants is equal to the product 
of the probability of the state at the first time instant and the probabilities of the 
subsequent state transitions (Eq. 11.3). 

e For discrete-time Markov chains: (1) the n-step transition probability matrix P(n) 
is equal to P”, where P is the one-step transition probability; (2) the state proba- 
bility after n steps p(n) is equal to p(0)P”, where p(0) is the initial state probabil- 
ity; and (3) P” approaches a constant matrix as n — œ for Markov chains that 
settle into steady state. 

e The states of a discrete-time Markov chain can be divided into disjoint classes. 
The long-term behavior of a Markov chain is determined by the properties of its 
classes. In particular, for ergodic Markov chains the stationary state probabilities 
represent the long-term proportion of time spent in each state. 

e A continuous-time Markov chain can be viewed as consisting of a discrete-time 
embedded Markov chain that determines the state transitions and of exponen- 
tially distributed state occupancy times. 

e For continuous-time Markov chains: (1) the state probabilities and the transi- 
tion probability matrix can be found by solving Eq. (11.39); (2) the steady state 
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probabilities can be found by solving the global balance equation, Eq. (11.40b) 
or (11.40c). 

A continuous-time Markov chain has a steady state if its embedded Markov 
chain is irreducible and positive recurrent with unique stationary pmf given by 
the solution of the global balance equations. 

The time-reversed version of a Markov chain is also a Markov chain. A discrete- 
time (continuous-time) irreducible, stationary ergodic Markov chain is reversible 
if the transition probability matrix (transition rate matrix) for the forward and re- 
verse processes is the same. 

Matrix numerical methods can be used to find the time-dependent and the sta- 
tionary probabilities of Markov chains. 


CHECKLIST OF IMPORTANT TERMS 


Accessible state 

Birth-and-death process 
Chapman-Kolmogorov equations 
Class of states 

Embedded Markov chain 
Ergodic Markov chain 

Global balance equations 
Homogeneous transition probabilities 
Irreducible Markov chain 
Markov chain 

Markov process 

Markov property 


Period of a state/class 
Positive recurrent state 
Recurrent state/class 
Reversible Markov chain 
State 

State occupancy time 

State probabilities 
Stationary state pmf 
Stochastic matrix 
Time-reversed Markov chain 
Transient state/class 
Transition probability matrix 


Mean recurrence time 


Trellis diagram 


Null recurrent state 
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References [1] and [2] contain very good discussions of discrete-time Markov chains. 
Feller has a rich set of classic examples that are a pleasure to read. Reference [3] gives 
a concise but quite complete introduction to Markov chains. Reference [4] provides an 
introduction to discrete-time and continuous-time Markov chains at about the same 
level as this chapter. References [6] and [7] give a more rigorous and complete cover- 
age of Markov chains and processes. 
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Section 11.1: Markov Processes 


11.1. 


11.2. 


11.3. 


11.4. 


Let M,, denote the sequence of sample means from an iid random process X,,: 


HER Pee KH 


n 


n 


(a) Is M, a Markov process? 
(b) Ifthe answer to part a is yes, find the following state transition pdf: 


fu, (x | M,-1 = y). 


An urn initially contains five black balls and five white balls. The following experiment is 
repeated indefinitely: A ball is drawn from the urn; if the ball is white, it is put back in the 
urn, otherwise it is left out. Let X,„ be the number of black balls remaining in the urn after 
n draws from the urn. 


(a) Is X,, a Markov process? If so, find the appropriate transition probabilities and the 
corresponding trellis diagram. 


(b) Do the transition probabilities depend on n? 
(c) Repeat part a if the urn initially has K black balls and K white balls. 


An urn initially contains two black balls and two white balls. The following experiment is 
repeated indefinitely: A ball is drawn from the urn; with probability a, the color of the ball 
is changed to the other color and is then put back in the urn, otherwise it is put back with- 
out change. Let X, be the number of black balls in the urn after n draws from the urn. 


(a) Is X,,a Markov process? If so, find the appropriate transition probabilities. 

(b) Do the transition probabilities depend on n? 

(c) Repeat part aifa = 1. What changes? 

(d) Repeat parts a and c if the urn contains K black balls and K white balls. 

Michael and Marisa initially have four pens each. Out of the total of eight pens, half are 
good and half are dry. The following experiment is repeated indefinitely: Michael and 
Marisa exchange a randomly selected pen from their set. Let X,, be the number of good 
pens in Marisa’s set after n draws. 

(a) Is X,,a Markov process? If so, find the appropriate transition probabilities. 

(b) Do the transition probabilities depend on n? 


11.5. 


11.6. 


11.7. 


11.8. 


Problems 703 


(c) Repeat part a if Michael and Marisa initially have a total of K good pens and K dry 
pens. 


Does a Markov process have independent increments? Hint: Use the process in Problem 11.2 
to support your answer. 


Let X,, be the Bernoulli iid process, and let Y,, be given by 
Yn = Xn + Xy-1- 


It was shown in Example 11.2 that Y, is not a Markov process. Consider the vector 
process defined by Z, = (Xn, Xy-1). 

(a) Show that Z, is a Markov process. 

(b) Find the state transition diagram for Z,,. 

(a) Show that the following autoregressive process is a Markov process: 


Yn = 1Yn-1 + Xn Yo = 0, 


where X, is an iid process. 
(b) Find the transition pdf if X, is an iid Gaussian sequence. 
The amount of water in an aquifer at year end is a random variable X,,. The amount of 
water drawn from the aquifer in a year is a random variable D,, and the amount restored 
by rainfall is W,,. 
(a) Finda set of equations to describe the total amount of water X, in the aquifer over time. 
(b) Under what conditions is X,, a Markov process? 


Section 11.2: Discrete-Time Markov Chains 


11.9. 


11.10. 


11.11. 


11.12. 


Let X, be an iid integer-valued random process. Show that X, is a Markov process and 
give its one-step transition probability matrix. 


An information source generates iid bits for X,, for which P[0] = a = 1 — P[1]. 


(a) Suppose that X, is transmitted over a binary symmetric channel with error proba- 
bility e. Find the probabilities of the outputs of the channel. 


(b) Suppose that X, is transmitted over K consecutive identical and independent binary 
symmetric channels. Does the sequence of channel outputs form a Markov chain? 


(c) Find the K-step transition probabilities that relate the input bits from the source to 
the outputs of the Kth channel. 


(d) What are the probabilities of the outputs of the Kth channel as K — co? 


Each time unit a data multiplexer receives a packet with probability a, and/or transmits a 
packet from its buffer with probability b. Assume that the multiplexer can hold at most N 
packets. Let X,„ be the number of packets in the multiplexer at time n. 


(a) Show that the system can be modeled by a Markov chain. 

(b) Find the transition probability matrix P. 

(c) Find the stationary pmf. 

Let X,„ be the Markov chain defined for the urn experiment in Problem 11.2. 

(a) Find the one-step transition probability matrix P for X,. 

(b) Find the two-step transition probability matrix P? by matrix multiplication. Check 
your answer by computing ps4(2) and comparing it to the corresponding entry in P?. 

(c) What happens to X,, as n approaches infinity? Use your answer to guess the limit of 
Prasn— oo. 
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11.13. 


11.14. 


11.15. 


11.16. 


11.17. 


11.18. 


11.19. 


Let X,, be the Markov chain defined in Problem 11.3. 

(a) Find the one-step transition probability matrix P for X,, with a = 1/10. 

(b) Find P?, P*, and P® by matrix multiplication. 

(c) What happens to X, as n approaches infinity? 

(d) Repeat parts a,b, and c ifa = 1. 

In the Ehrenfest model of heat exchange, two containers hold a total of p particles 


[Feller, pp. 121]. Each time instant a particle is selected at random and moved to the other 
container. Let X,, be the number of particles in the first container. 


(a) Show that this model is the same as in Problem 11.3(d). 


(b) Use the state transition diagram to explain why the model exhibits a “central 
force.” 


(c) Show that the stationary pmf is given by a binomial pmf with parameters p and 1/2. 
Give an intuitive explanation for this result. 


Let X,„ be the pen-exchange Markov chain defined in Problem 11.4. 

(a) Find P 

(b) Use Octave or a numerical program to find P?, P*, and P? by matrix multiplication. 
(c) What happens to X, as n approaches infinity? 


In the Bernoulli-Laplace model for diffusion, a total of 2p particles are distributed 
between two containers, and half of the particles are black and half are white [Feller, 
1968, pp. 378]. Each time instant a particle is selected at random from each container 
and moved to the other container. Let X,„ be the number of white particles in the first 
container. 

(a) Show that this model is the same as in Problem 11.4(c). 


(b) Show that the stationary pmf is given by: 


2112 
m; = a /( P | forj = 0,1,..., p. 
j p 


The vector process Z,, in Problem 11.6 has four possible states, so in effect it is equivalent 

to a Markov chain with states {0, 1, 2, 3}. 

(a) Find the one-step transition probability matrix P. 

(b) Find P? and check your answer by computing the probability of going from state 
(0, 1) to state (0, 1) in two steps. 

(c) Show that P” = P? for all n > 2. Give an intuitive justification for why this is true 
for this random process. 

(d) Find the steady state probabilities for the process. 

Consider a sequence of Bernoulli trials with probability of success p and let X,, denote 

the number of consecutive successes in a streak up to time n. 

(a) Show that X,, is a Markov chain. 

(b) Find the one-step transition probability and draw the corresponding state transition 
diagram. 

(c) Find the stationary pmf assuming p < 1. 

Two gamblers play the following game. A fair coin is flipped; if the outcome is heads, 

player A pays player B $1, and if the outcome is tails player B pays player A $1. The game 

is continued until one of the players goes broke. Suppose that initially player A has $1 

and player B has $2, so a total of $3 is up for grabs. Let X,, denote the number of dollars 

held by player A after n trials. 


11.20. 


11.21. 


11.22. 


11.23. 
11.24. 
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(a) Show that X,, is a Markov chain. 

(b) Sketch the state transition diagram for X, and give the one-step transition probabil- 
ity matrix P. 

(c) Use the state transition diagram to help you show that for n even (i.e.,n = 2k), 


Di(n) = GY fori = 1,2 and pj(n) (1 Gy) = pyz}(n). 


(d) Find the n-step transition probability matrix for n even using part c. 
(e) Find the limit of P” as n —> œ. 
(f) Find the probability that player A eventually wins. 


A certain part of a machine can be in two states: working or undergoing repair. A working 
part fails during the course of a day with probability a. A part undergoing repair is put into 
working order during the course of a day with probability b. Let X, be the state of the part. 


(a) Show that X, is a two-state Markov chain and give its one-step transition probabili- 
ty matrix P. 

(b) Find the n-step transition probability matrix P”. 

(c) Find the steady state probability for each of the two states. 


A machine consists of two parts that fail and are repaired independently. A working part 

fails during any given day with probability a. A part that is not working is repaired by the 

next day with probability b. Let X,, be the number of working parts in day n. 

(a) Show that X,, is a three-state Markov chain and give its one-step transition proba- 
bility matrix P. 

(b) Show that the steady state pmf m is binomial with parameter p = b/(a + b). 

(c) What do you expect is the steady state pmf for a machine that consists of n parts? 


A stochastic matrix is defined as a nonnegative matrix for which the elements of each 
row add to one. 


(a) Show that the transition probability matrix P for a Markov chain is a stochastic matrix. 
(b) Show that if P and Q are stochastic matrices, then PQ is also a stochastic matrix. 

(c) Show that if P is a stochastic matrix, then P” is also a stochastic matrix. 

Show that if P* has identical rows, then P! has identical rows for all j = k. 

Prove Eq. (11.14) by induction. 


Section 11.3: Classes of States, Recurrence Properties, and Limiting Probabilities 


11.25. 


(a) Sketch the state-transition diagrams for the Markov chains with the following tran- 
sition probability matrices. 

(b) Specify the classes of the Markov chains and classify them as recurrent or transient. 

(c) Use Octave to calculate the first few powers of each matrix. Note any interesting 


behavior. 
fo 1 0 [1 0 0 1/2 12 0 
© |12 0 1⁄2 ád) |0 0 1 dij) | 0 1 0 
|1 0 0 [0 1 0 12 0 1/2 
[Oo 12 12 0 [12 2 0 0 
. 0 0 1 0 1 0 0 0 
Mo o 1 olf o 14 14 
1 0 0 0 | O 1⁄4 1⁄4 1⁄2 
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11.26. Characterize the long-term behavior of the Markov chains in Problem 11.25. Find the 
long-term proportion of time spent in each state. Find the stationary pmf where applica- 
ble and determine whether it is unique. 


11.27. Consider a three-state Markov chain. Select transition probabilities and sketch the asso- 
ciated transition diagram to produce the following attributes: 


(a) X,, is irreducible. 
(b) X,, is has one transient class and one recurrent class. 
(©) X,, is has two recurrent classes. 


11.28. (a) Find the transition probability matrices for the Markov chains with the state transi- 
tion diagrams shown in Fig. P11.1. 


Q 
(iii) 
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FIGURE P11.1 


(b) Specify the classes of the Markov chains and classify them as recurrent or transient; 
periodic or aperiodic. 


(c) Characterize the long-term behavior of the Markov chains and find the long-term 
proportion of time spent in each state, and the stationary pmf where applicable. 
(d) Use Octave to evaluate P” for n = 1, 2, 3, 4, 5. Explain any interesting results you 
may find. 
11.29. (a) Apply the PageRank modeling procedure to the Markov chains in Problem 11.28 to 
find the transition probability matrix. 
(b) Find the PageRank value for each node. 
11.30. Consider a random walk in the set {0, 1,..., M } with transition probabilities 


Po. = 1, pm,m-1 = land pj,i-1 = q Pii+t = pfor i=1,...,.M—1. 


(a) Sketch the state transition diagram. 


(b) Find the long-term proportion of time spent in each state, and the limit of p(n) as 
n— œ. Evaluate the special case when p = 1/2. 


11.31. Repeat Problem 11.30 if the random walk is modified so that 


Po = P, Poo = 4, PM,M-1 = f, and Pm,m = P. 


11.32. 


11.33. 


11.34. 
11.35. 


11.36. 


11.37. 


11.38. 
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For a finite-state, irreducible Markov chain, explain why none of the states can have zero 
probability. 

Suppose that state 7 belongs to a recurrent class of a finite-state Markov chain and that 
pull) > 0. Show that i belongs to a class which is aperiodic. 

Prove that positive and null recurrence are class properties. 


In this problem we develop expressions for recurrence probabilities and expectations. 
Let a, = fi;(n) be the probability that a first return to state i from state i occurs after n 
steps; and let b, = p,m) be the probability of a return to state i from state i after n steps. 


(a) Show that: b, = Dian j where by = 1, aọ = 0. Hint: Use conditional probability. 
= 


(b) Let A(z) and B(z) be the generating functions of {a„} and {b,,} as defined in Eq. (4.84). 


Explain why the series converge for |z| < 1, and show that B (z) = TSA: 
— A(z 
(c) Show that f; = lim A(z). 
C— 
(d) Show that state i is recurrent if and only if lim B(z) = ©, 
Zs 


Consider a Markov chain with state space {0, 1, 2,... } and the following transition prob- 
abilities: 


Poj = fjand pj-1 = 1 where 1 = fi t h eee a fi t 


(a) Sketch the state transition diagram. 
(b) Determine whether the Markov chain is irreducible. 
(c) Determine whether state 0 is transient, or null/positive recurrent. 
(d) Find an expression for the stationary pmf, if it exists. 
(e) Provide specific answers to parts c and d if {f;} is given by the following pmfs: (i) geo- 
metric; (ii) Zipf. (See Eq. (3.51).) 
Consider a Markov chain with state space {1, 2,... } and the following transition proba- 
bilities: 
Pjj+ı = ajand pj, = 1 — a; where 0 < aj < 1. 


(a) Sketch the state transition diagram. 

(b) Determine whether the Markov chain is irreducible. 

(c) Determine whether state 1 is transient, or null/positive recurrent. 

(d) Find an expression for the stationary pmf, if it exists. 

(e) Provide specific answers to parts c and d if: 
G) aj;=1/2 alj i) a; = (j—1)i (iii) aj = 1/j 
(Gv) a; = (1/2) (v) aj=1- (1/2), 

Let X, and Y, be two ergodic Markov chains with the same state space but different tran- 

sition probability matrices, P) and P,, respectively, and different stationary pmf’s. 

(a) A new process is constructed as follows. A coin is flipped and if the outcome is heads, 
P, is used to generate the entire sequence; but if the outcome is tails, P, is used instead. 
Is the resulting process Markov and does it have a stationary pmf? Is it ergodic? 

(b) Repeat part a if the process is constructed as follows. A coin is flipped before every 
time instant and the associated transition probability matrix is used to determine the 
next state. 


(c) Repeat part a if the state for odd (even) time instants is determined according to P, ( P»). 
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11.39, 


11.40. 


Find the probability of state 1 for the processes in Problem 11.38(a-c) if X,, and Y, are 
two processes from Problem 11.37(e) with two different geometric pmfs in (i) and (iv). 


Construct a multiclass infinite-state Markov chain that has the following attributes: 
(a) One class is transient and one class is null recurrent. 
(b) One class is null recurrent and one class is positive recurrent. 


Section 11.4: Continuous-Time Markov Chains 


11.41. 


11.42. 


11.43. 


11.44. 


11.45. 


11.46. 


Consider the simple queueing system discussed in Example 11.36. 
(a) Use the results in Example 11.36 to find the state transition probability matrix. 
(b) Find the following probabilities: 


P[X(1.5) = 1, X(3) = 1|X(0) = 0] 
P[X(1.5) = 1, X(3) = 1). 


A rechargeable battery in a depot is in one of three states: fully charged, in use, or 

recharging. Assume the mean time in each of these states is: 1/A; 1 hour; 3 hours. Batter- 

ies are not put into use unless they are fully charged. 

(a) Find a Markov model for the battery states and sketch the state transition diagram. 

(b) Find the stationary pmf. Explain how the pmf varies with À. 

Suppose that the depot in Problem 11.42 has two batteries. Define the state at time t by 

{Nr(t), Nu(t), Nc(t)}, that is, by the number of batteries in each state. 

(a) Sketch the state transition diagram for a six-state Markov chain for the system. 

(b) Find the stationary pmf and evaluate it for various values of A. 

Rolo, a Chihuahua, spends most of the daytime sleeping in the kitchen. When a person 

enters the kitchen, Rolo greets him or her and wags her tail for an average time of one 

minute. At the end of this period Rolo is fed with probability 1/4, patted briefly with 

probability 5/8, or taken for a walk with probability 1/8. If fed, Rolo spends an average of 

two minutes eating. The walks take 15 minutes on average. After eating, being patted, or 

walking, she returns to sleep. Assume that people enter the kitchen on average every hour. 

(a) Find a Markov chain model with four states: {sleep, greet, eat, walk}. Specify the 
transition rate matrix. 

(b) Find the steady state probabilities. 

A critical part of a machine has an exponentially distributed lifetime with parameter 

a = 1. Suppose that n = 4 spare parts are initially in stock, and let M(t) be the number of 

spares left at time t. 

(a) Find p;(t) = P[N(s + t) = j| N(s) = i]. 

(b) Find the transition probability matrix. 

(c) Find p;(t). 

(d) Plot p,(t) versus time for j = 0, 1, 2, 3, 4. 

(e) Give the general solution for p;(t) for arbitrary a > 0 and n. 

A shop has n = 3 machines and one technician to repair them. A machine remains in the 

working state for an exponentially distributed time with parameter w = 1/3. The techni- 

cian works on one machine at a time, and it takes him an exponentially distributed time of 

rate a = 1 to repair each machine. Let X(t) be the number of working machines at time t. 

(a) Show that if X(t) = k, then the time until the next machine breakdown is an expo- 
nentially distributed random variable with rate ky. 


11.47. 


11.48. 


11.49. 


11.50. 


11.51. 
11.52. 
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(b) Find the transition rate matrix [y;;] and sketch the transition rate diagram for X(t). 
(c) Write the global balance equations and find the steady state probabilities for X(t). 
(d) Redo parts b and c if the number of technicians is increased to 2. 

(e) Find the steady state probabilities for arbitrary values of n, a, and u. 


A speaker alternates between periods of speech activity and periods of silence. Suppose 
that the former are exponentially distributed with mean 1/a = 200 ms and the latter expo- 
nentially distributed with mean 1/8 = 400 ms. Consider a group of n = 4 independent 
speakers and let N(t) denote the number of speakers in speech activity at time t. 


(a) Find the transition rate diagram and the transition rate matrix for this system. 


(b) Write the global balance equations and show that the steady state pmf is given by a 
binomial distribution. Why is this solution not surprising? 


(c) Find the steady state probabilities for arbitrary values of n, a, and £. 


A continuous-time Markov chain X(t) can be approximated by a sampled-time discrete- 
time Markov chain X,, = X(n6) where the sampling interval is 6 seconds. 


(a) Find the transition probabilities for X, if X(t) is the M/M/1 queue in Example 11.39. 
(b) Find the stationary pmf for part a. Compare to the answer in the example. 


Consider the single-server queueing system in Example 11.39. Suppose that at most K 
customers can be in the system at any time. Let M(r) be the number of customers in the 
system at time t. Find the steady state probabilities for N(¢). 


(a) Find the embedded Markov chain for the process described in Example 11.39. 
(b) Find the stationary pmf of the embedded Markov chain. 

(c) Characterize the long-term probabilities of the process using Eq. (11.50). 
Repeat Problem 11.50 for the process described in Example 11.40. 


Suppose that the embedded Markov chain for the process N(t) is given by the discrete- 
time Markov chain in Problem 11.36 with {f;} given by a geometric pmf. Find the steady 
state probabilities of N(¢), if they exist, in the following cases: 


(a) The occupancy times of all states are exponentially distributed with mean 1. 
(b) The occupancy time of state j is exponentially distributed with mean j. 
(c) The occupancy time of state j is exponentially distributed with mean 2/. 


*Section 11.5: Time-Reversed Markov Chains 


11.53. 


11.54. 


11.55. 


11.56. 
11.57. 


N balls are distributed in two urns. At time n, a ball is selected at random, removed from 
its present urn, and placed in the other urn. Let X,, denote the number of balls in urn 1. 


(a) Find the transition probabilities for X,,. 

(b) Argue that the process is time reversible and then obtain the steady state probabili- 
ties for X,,. 

A point moves in the unit circle in jumps of +90°. Suppose that the process is initially at 

0°, and that the probability of +90° is p. 

(a) Find the transition probabilities for the resulting Markov chain and obtain the 
steady state probabilities. 


(b) Is the process reversible? Why or why not? 


Find the transition probabilities for the time-reversed version of the random walk dis- 
cussed in Problem 11.31. Is the process reversible? 


Is the Markov chain in Problem 11.16 time reversible? 
Is the Markov chain in Problem 11.17 time reversible? 
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11.58. 


11.59. 


11.60. 
11.61. 
11.62. 


(a) Specify the time-reversed version of the process defined in Problem 11.49. Is the 
process reversible? 


(b) Find the steady state probabilities of the process using Eq. (11.67). 


Use the results of Example 11.42 to find the stationary pmf of the Markov chains in 
Problem 11.37(i). 


Determine whether the simple queueing system in Example 11.36 is reversible. 
Determine whether the machine repair model in Problem 11.46 is reversible. 
(a) Is the speech activity model in Problem 11.47 reversible? 

(b) Is the model reversible if a = B? 


*Section 11.6: Numerical Techniques for Markov Chains 


11.63. 


11.64. 


11.65. 


11.66. 


Consider the urn experiment in Problem 11.2. 

(a) Use matrix diagonalization to find an expression for the state pmf as a function of 
time. Plot the state pmf vs. time. 

(b) Run a simulation for this urn experiment 100 times and build a histogram of the 
number of steps that take place until the last black ball is removed. 

(c) Derive the pmf for the number of steps that elapse until the last black ball is re- 
moved. Compare the theoretical pmf with the observed histogram in part b. 

Consider the Bernoulli—Laplace diffusion model from Problem 11.16 with p = 5. 

(a) Use matrix diagonalization to obtain an expression for the time-dependent state 
pmf. Plot the state pmf vs. time for different initial conditions. 

(b) Write a simulation for the model and make several observations of 200-step sample 
functions. Is the process ergodic? Is it necessary to perform multiple realizations of 
the process, or does it suffice to collect statistics from one long realization? 

(c) Compare histograms of the state occupancy and compare to the theoretical result 
for: 5 separate realizations of 200 steps; 1 realization of 1000 steps. 

(d) Use the autocov function in Octave to estimate the covariance function of the process. 

Consider the data multiplexer in Problem 11.11. 

(a) Derive the transition probabilities for the multiplexer assuming a maximum state of 
N = 100. Find the steady state pmf for the following parameters: b = 0.5 and 
a = 0.1, a = 0.25, a = 0.50. 

(b) Simulate the data multiplexer for each of the cases in part a. Run the simulation for 
1000 steps. 

(c) For each realization record a histogram of the length of idle periods (when the sys- 
tem remains continuously empty) and the length of the busy periods (when the sys- 
tem remains continuously nonempty). Which of the three choices of parameters 
above correspond to “heavy traffic”; “light traffic?” 


Consider the gamblers’ experiment in Problem 11.19 with player A beginning with $6 

and player B with $3. 

(a) Find the transition probability P and obtain an expression for P”. What is the probabil- 
ity that player A wins? What is the average time until player A wins (when he wins)? 

(b) Simulate 500 trials of the experiment. Find the relative frequency of player A win- 
ning and compare to the theoretical result. 

(c) Find the mean time until player A wins; until player B wins. Compare to the theoret- 
ical results. 


11.67. 


11.68. 


11.69. 


11.70. 


11.71. 


Problems 711 


Consider the residual lifetime process in Problem 11.36. Assume a machine state of 100. 

(a) Simulate 1000 steps of the process with a geometric random variable with mean 5. 
Record histograms of the state pmf and obtain the autocovariance of the realization. 

(b) Repeat part a with a Zipf random variable of mean 5. Compare the histogram and 
autocovariance to those found in part a. 


Consider the age process in Problem 11.37. Assume a machine state of 100. 

(a) Simulate 1000 steps of the process with a; = (j — 1)/j. Does the process behave as 
expected? 

(b) Repeat part a with a; = 1 — (1/2). 

Consider the battery experiment in Problem 11.43. 

(a) Use matrix diagonalization to obtain the time-dependent state transition probabili- 
ties for A = 0.1, 1, 10. What are the steady state probabilities? What are the corre- 
sponding embedded state probabilities? 

(b) Simulate 500 hours of operation and observe the histogram of the embedded state 
occupancies. Compare to the theoretical results. 

Consider the machine repair model in Problem 11.46. Assume n = 10 machines, 

p = 1/10 average working time, and a = 1. 

(a) Obtain the time-dependent state transition probabilities for 1 and 2 technicians. 
What are the steady state probabilities? What are the corresponding embedded 
state probabilities? 

(b) Simulate 1000 hours of operation and observe the histogram of the embedded state 
occupancies. Compare to the theoretical results. 

Use the simulator developed in Example 11.49 to simulate a sampled-time approxima- 

tion to the birth-death process shown in Figure 11.20(b). Simulate 200 seconds of an 

M/M/1 queue in which jobs arrive at rate A = 0.9 jobs per second and jobs complete pro- 

cessing at a rate of 1 job every second. Assume the system is initially empty. Show the re- 

alizations of the sampled process and measure the proportion of time spent in each state. 

Compare these to the theoretical values. 


Problems Requiring Cumulative Knowledge 


11.72. 


11.73. 


11.74. 


(a) The Markov chain in Fig. 11.6(b) is started in state 0 at time 0. Find the n-step tran- 
sition probability matrix for even and odd numbers of steps. What happens as 
no? 

(b) Let X,, be an irreducible, periodic, positive recurrent Markov chain in steady state. Is 
X,, a cyclostationary random process? 

Let X,„ be an ergodic Markov chain. Let J;(1) be the indicator function for state j at time 

n, that is, /;(72) is 1 if the state at time n is j, and 0 otherwise. What is the limiting value of 

the time average of /)(1)? Is this result an ergodic theorem? 

Let X(t) be a continuous-time model for speech activity, in which a speaker is active 

(state 1) for an exponentially distributed time with rate a and is silent (state 0) for an ex- 

ponentially distributed time with rate B. Assume all active and silence durations are in- 

dependent random variables. 

(a) Find a two-state Markov chain for X(t). 

(b) Find pọ(t) and p(t). 

(c) Find the autocorrelation function of X(t). 
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(d) 
(e) 
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If X(t) is asymptotically wide-sense stationary, find its power spectral density. 
Suppose we have n independent speakers, and let M(t) be the total number of speak- 
ers active at time t. Find the autocorrelation function of M(t), and its power spectral 
density if it is asymptotically wide-sense stationary. 


11.75. Let X, be a continuous-valued discrete-time Markov process. 


11.76. 


(a) 
(b) 


Find the expression for the joint pdf corresponding to Eq. (11.5). 
Find the expression for the two-step transition pdf corresponding to Eq. (11.12a). 


Consider the aquifer in Problem 11.8. 


(a) 


(b) 


(c) 


Find a recursive equation for the amount of water in the aquifer X,,,; in yearn + 1 
in terms of the amount of water in year n, the amount withdrawn from use D,,, and 
the amount restored by rainfall W,,. Note that the amount of water must be nonneg- 
ative. 

Find an integral expression relating the steady state pdf of X to the pdf’s of W and 
D. Assume that W and D are independent and Gaussian random variables. Propose 
possible approaches to solving these equations. 

Write a computer simulation to investigate the distribution of X as a function of W 
and D assuming: W,, and D,, are iid random variables with the same mean; D, is iid 
random variable, but W,, is independent with a slowly varying mean (with period 100 
years) that is equal to that of D,, when averaged over the entire period. 


CHAPTER 


Introduction to 
Queueing Theory 


In many applications, scarce resources such as computers and communication sys- 
tems are shared among a community of users. Users place demands for these 
resources at random times, and they require use of these resources for time periods 
whose durations are random. Inevitably requests for the resource arrive while the 
resource is occupied, and a mechanism to provide an orderly access to the resource 
is required. The most common access control mechanism is to file user requests in a 
waiting line or “queue” such as might be formed at a bank by customers waiting to 
be served. Resource sharing can also take place in systems of very large scale, e.g., 
peer-to-peers networks, where the “queues” are not as readily apparent. 

Queueing theory deals with the study of waiting lines and resource sharing. The 
random nature of the demand behavior of customers implies that probabilistic mea- 
sures such as average delay, average throughput, and delay percentiles are required 
to assess the performance of such systems. Queueing theory provides us with the 
probability tools needed to evaluate these measures. 

This chapter is organized as follows: 


e Section 12.1 introduces the basic structure of a queueing system. 


e Section 12.2 develops Little’s formula which provides a fundamental relationship 
that is applicable in most queueing systems. 


e In Section 12.3 we examine the M/M/1 queue and use it to develop many of the 
basic insights into queueing systems. 


e Sections 12.4 and 12.5 develop multiserver systems and finite-source systems 
which can both be represented by Markov chains. 


e Sections 12.6 and 12.7 develop M/G/1 queues which require more complex 
modeling. 


e Section 12.8 and 12.9 presents Burke’s and Jackson’s theorems which allow us to 
model networks of queues. 


e Finally Section 12.10 considers the simulation of queueing systems. 
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System 
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FIGURE 12.1 

(a) Elements of a queueing system. (b) Elements of a queueing system model: N(t), number in system; N,(t), 
number in queue; N, (t), number in service; W, waiting time in queue; 7, service time; and T, total time in the 
system. 


THE ELEMENTS OF A QUEUEING SYSTEM 


Figure 12.1(a) shows a typical queueing system and Fig. 12.1(b) shows the elements of 
a queueing system model. Customers from some population arrive at the system at the 
random arrival times S1, S2, S3,...,5;,..., where S; denotes the arrival time of the ith 
customer. We denote the customer arrival rate by À. 

The queueing system has one or more identical servers, as shown in Fig. 12.1(a). 
The ith customer arrives at the system seeking a service that will require 7; seconds of 
service time from one server. If all the servers are busy, then the arriving customer joins 
a queue where he remains until a server becomes available. Sometimes, only a limited 
number of waiting spaces are available so customers that arrive when there is no room 
are turned away. Such customers are called “blocked” and we will denote the rate at 
which customers are turned away by Ap. 
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The queue or service discipline specifies the order in which customers are select- 
ed from the queue and allowed into service. For example, some common queueing dis- 
ciplines are first come, first served, and last come, first served. The queueing discipline 
affects the waiting time W; that elapses from the arrival time of the ith customer until 
the time when it enters service. The total delay 7; of the ith customer in the system is 
the sum of its waiting time and service time: 


T; = W; + 7. (12.1) 


From the customer’s point of view, the performance of the system is given by the 
statistics of the waiting time W and the total delay 7, and the proportion of customers 
that are blocked, A,/A. From the point of view of resource allocation, the performance 
of the system is measured by the proportion of time that each server is utilized and the 
rate at which customers are serviced by the system, Ay = A — àp. These quantities are 
a function of M(t), the number of customers in the system at time t, and N,(t), the num- 
ber of customers in queue at time t. 

The notation a/b/m/K is used to describe a queueing system, where a specifies 
the type of arrival process, b denotes the service time distribution, m specifies the 
number of servers, and K denotes the maximum number of customers allowed in 
the system at any time. If a is given by M, then the arrival process is Poisson and the 
interarrival times are independent, identically distributed (iid) exponential ran- 
dom variables. If b is given by M, then the service times are iid exponential random 
variables. If b is given by D, then the service times are constant, that is, determinis- 
tic. If b is given by G, then the service times are iid according to some general dis- 
tribution. For example, in this chapter we deal with M/M/1, M/M/1/K, M/M/c, 
M/M/c/c, M/D/1, and M/G/1 queues. 

Queueing system models find many applications in electrical and computer engi- 
neering. The “servers” in Fig. 12.1 can represent a variety of resources that perform 
“work.” For example, in communication networks, the server can represent a communi- 
cations line that transmits packets of information. In computer systems, the servers could 
represent processes in a computer that each handles Web queries from a particular 
client. Modern distributed applications combine these communications and computing 
resources into vast networks of interacting queueing systems. 


LITTLE'S FORMULA 


We now develop Little’s formula, which states that, for systems that reach steady state, 
the average number of customers in a system is equal to the product of the average ar- 
rival rate and the average time spent in the system: 


E[N] = AE[T]. (12.2) 


This formula is valid under very general conditions, so it is applicable in an amazing 
number of situations. 

Consider the queueing system shown in Fig. 12.2. The system begins empty at time 
t = 0, and the customer arrival times are denoted by S,, S,,.... Let A(t) be the number 
of customer arrivals up to time t. The ith customer spends time 7; in the system and then 
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FIGURE 12.2 


Time in system is departure time minus arrival time. 
Number in system at time t is number of arrivals 
minus number of departures. 


departs at time D; = S; + T;. We will let D(f) be the number of customer departures up to 
time t. The number of customers in the system at time ¢ is the number of arrivals that have 
not yet left the system: 


N(t) = A(t) — D(t). (12.3) 


Figure 12.3 shows a possible sample path for A(t), D(t), and N(t) in a queueing system 
with “first come, first served” service discipline. 

Consider the time average of the number of customers in the system M(t) during 
the interval (0, t]: 


(N), = 1 N(t') dt’. (12.4) 


In Fig. 12.3, N(¢) is the region between A(t) and D(t), so the above integral is given by 
the area of the enclosed region up to time t. It can be seen that each customer who has 


>t 


FIGURE 12.3 
Total time spent by the first seven customers is the area in A(t) — D(t) up to 
time to. 
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departed the system by time t contributes 7; to the integral, and thus the integral is sim- 
ply the total time all customers have spent in the system up to time t. 

Consider, for now, a time instant t = tọ for which N(t) = 0 as in Fig. 12.3, then 
the integral is exactly given by the sum of the T; of the first A(t) customers: 


149 
t 


(Nj == YT. (12.5) 


i=1 
The average arrival rate up to time t is given by 
(A), = ——. (12.6) 
If we solve Eq. (12.6) for t and substitute into Eq. (12.5), we obtain 
1 A(t) 
N), = (A); —~ DT, 12.7 


Let (T), be the average of the times spent in the system by the first A(t) cus- 
tomers, then 


T : Sr 12.8 

(T) = Aw & is (12.8) 
Comparing Eqs. (12.7) and (12.8), we conclude that 

(Nje = (AAT): (12.9) 


Finally, we assume that as t — œ, with probability one, the above time averages 
converge to the expected value of the corresponding steady state random processes, 
that is, 


T),— ET}. (12.10) 
Equations (12.9) and (12.10) then imply Little’s formula: 
E[N] = AE[T]. (12.11) 


The restriction of t to instants tọ where N (tọ) = 0 is not necessary. The time average 
of N(t) up to an arbitrary time t’ as shown in Fig. 12.3 is given by the average up to time fy 
plus a contribution from the interval from tọ to t’. If E[N] < 00, then as t becomes large, 
this contribution becomes negligible. 

The assumption of first come, first served service discipline is not necessary. It 
turns out that Little’s formula holds for many service disciplines. See Problem 12.2 for 
examples. In addition, Little’s formula holds for systems with an arbitrary number of 
servers. 

Up to this point we have implicitly assumed that the “system” is the entire queue- 
ing system, so N is the number in the queueing system and T is the time spent in the 
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queueing system. However, Little’s formula is so general that it applies to many inter- 
pretations of “system.” Examples 12.1 and 12.2 show other designations for “system.” 


Example 12.1 Mean Number in Queue 


Let N,(t) be the number of customers waiting in queue for the server to become available, and 
let the random variable W denote the waiting time. If we designate the queue to be the “system,” 
then Little’s formula becomes 


E[N,] = AE[W]. (12.12) 


Example 12.2 Server Utilization 


Let N,(t) be the number of customers that are being served at time t, and let 7 denote the service 
time. If we designate the set of servers to be the “system,” then Little’s formula becomes 


E[N,] = AE[7]. (12.13) 


E[ N,] is the average number of busy servers for a system in steady state. 

For single-server systems, N,(t) can only be 0 or 1, so E[N,] represents the proportion of 
time that the server is busy. If pọ = P| N(t) = 0] denotes the steady state probability that the 
system is empty, then we must have that 


1 — po = E[N,] = AE[T] (12.14) 
or 
Po = 1 — AElr], (12.15) 


since 1 — pọ is the proportion of time that the server is busy. For this reason, the utilization of a 
single-server system is defined by 


p = AE[r]. (12.16) 
We similarly define utilization of a c-server system by 


AE[T] 
p= ; 


: (12.17) 


From Eq. (12.13), p represents the average fraction of busy servers. 


THE M/M/1 QUEUE 


Consider a single-server system in which customers arrive according to a Poisson process 
of rate à so the interarrival times are iid exponential random variables with mean 1/A. 
Assume that the service times are iid exponential random variables with mean 1/y, and 
that the interarrival and service times are independent. In addition, assume that the sys- 
tem can accommodate an unlimited number of customers. The resulting system is an 
M/M/1 queueing system. In this section we find the steady state pmf of M(t), the number 
of customers in the system, and the pdf of T, the total customer delay in the system. 
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Distribution of Number in the System 


The number of customers M(t) in an M/M/1 system is a continuous-time Markov 
chain. To see why, suppose we are given that N(t) = k, and consider the next possi- 
ble change in the number in the system. The time until the next arrival is an expo- 
nential random variable that is independent of the service times of customers 
already in the system. The memoryless property of the exponential random variable 
implies that this interarrival time is independent of the present and past history of 
N(t). If the system is nonempty (i.e., N(t) > 0) the time until the next departure is 
also an exponential random variable. The memoryless property implies that the 
time until the next departure is independent of the time already spent in service. 
Thus if we know that N(t) = k, then the past history of the system is irrelevant as 
far as the probabilities of future states are concerned. This is the property required 
of a Markov chain. 

To find the transition rates for N(t), consider the probabilities of the various ways 
in which M(t) can change. 


(i) Since A(t), the number of arrivals in an interval of length ¢, is a Poisson process, 
the probability of one arrival in an interval of length 6 is 


2 
PLA(6) = 1] = Ee = alı a =} 
= AS + o(ô). (12.18) 


P[A(8) = 2] = 0(6). (12.19) 


(iii) Since the service time is an exponential random variable 7, the time a customer 
has spent in service is independent of how much longer he will remain in ser- 
vice because of the memoryless property of 7. In particular, the probability of a 
customer in service completing his service in the next 6 seconds is 


Pir = 6] =1-e = uô + o(ô). (12.20) 


(iv) Since service times and the arrival process are independent, the probability of 
one arrival and one departure in an interval of length 6 is 


P[A(8) = 1,7 = 8] = P[A(8) = 1]P[r = 8] = 0(8) (12.21) 


from Eqs. (12.18) and (12.20). Similarly, the probability of any change that in- 
volves more than a single arrival or a single departure is 0(6). 


Properties (i) through (iv) imply that N(t) has the transition rate diagram shown in 
Fig. 12.4. The global balance equations for the steady state probabilities are 


APo = MPA 
(A+ a)p; = ADj-1t Mj =f = 1,2,.... (12.22) 
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FIGURE 12.4 
Transition rate diagram for M/M/1 system. 


In Example 11.39, we saw that a steady state solution exists when p = A/p < 1: 
P[N(t)=j]=(1-p)e’ 7 =0,1,2,.... (12.23) 


The condition p = A/w < 1 must be met if the system is to be stable in the sense 
that M(t) does not grow without bound. Since u is the maximum rate at which the serv- 
er can process customers, the condition p < 1 is equivalent to 


Arrival rate = à < u = Maximum service rate. (12.24) 


If the inequality is violated, we have customers arriving at the system faster than they 
can be processed and sent out. This is an unstable situation in which the number in the 
queue will grow steadily without bound. 

The mean number of customers in the system is given by 


p 


=, (12.25) 
4p 


E[N] = 2JPINO) =] 
= 
where we have used the fact that N has a geometric distribution (see Table 3.1). 
The mean total customer delay in the system is found from Eq. (12.25) and Lit- 
tle’s formula: 


E[N À 
EIT] = , b ce 
l/u E[r] 1 


= = = ; 12.26 
1-p 1-p p-A ( ) 
The mean waiting time in queue is given by the mean of the total time in the sys- 
tem minus the service time: 


E[W] = E[T] — E[r] 


= E[r]. (12.27) 
L=p 
Little’s formula then gives the mean number in queue: 


E[N, 


a] = AE[W] 


= . (12.28) 
L=p 
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FIGURE 12.5 
Mean number of customers in the system versus utilization for M/M/1 
queue. 


The server utilization (defined in Example 12.2) is given by 


À 
1-p=1-(1-p)=p= i (12.29) 
Figures 12.5 and 12.6 show E[N] and E[T] versus p. It can be seen that as p approaches 


one, the mean number in the system and the system delay become arbitrarily large. 


Example 12.3 


A router receives packets from a group of users and transmits them over a single transmission line. 
Suppose that packets arrive according to a Poisson process at a rate of one packet every 4 ms, and 
suppose that packet transmission times are exponentially distributed with mean 3 ms. 


20 


=p 


FIGURE 12.6 
Mean total customer delay versus utilization for M/M/1 system. The delay 
is expressed in multiples of mean service times. 
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Find the mean number of packets in the system and the mean total delay in the system. What 
percentage increase in arrival rate results in a doubling of the above mean total delay? 

The arrival rate is 1/4 packets/ms and the mean service time is 3 ms. The utilization is 
therefore 


ENJ a= = =3 
1—p 
The mean time in the system is 
E[N] 3 
E[T] = i = V4 = 12ms. 


The mean time in the system will be doubled to 24 ms when 
E[r] 3 
1-p' 1- p 


24 


PG 


The resulting utilization is p' = 7/8 and the corresponding arrival rate is A’ = p'u = 7/24. 
The original arrival rate was 6/24. Thus an increase in arrival rate of 1/6 = 17% leads to a 
100% increase in mean system delay. 

The point of this example is that the onset of congestion is swift. The mean delay increases 
rapidly once the utilization increases beyond a certain point. 


Example 12.4 Concentration and Effect of Scale 


A large processor handles transactions at a rate of Ky transactions per second. Suppose transac- 
tions arrive according to a Poisson process of rate KA transactions/second, and that transactions 
require an exponentially distributed amount of processing time. Suppose that a proposal is made 
to eliminate the large processor and to replace it with K processors, each with a processing rate of 
pm transactions per second and an arrival rate of A. Compare the mean delay performance of the 
existing and the proposed systems. 

The large processor system is an M/M/1 queue with arrival rate KA, service rate Ku, and 
utilization p = KA/Kp = A/w. The mean delay is given by Eq. (12.26): 


E E[r] _ 1/Ku 
= 1-p 1-p 


Each of the small processors is an M/M/1 system with arrival rate A, service rate u, and 
utilization p = A/w. The mean delay is 


E{r’ 1 
EIT’ a = te = Ket 


Thus, the system with the single large processor with processing rate Ky has a smaller mean 
delay than the system with K small processors each of rate u. In other words, the concentration of 
customer demand into a single system results in significant delay performance improvement. 
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12.3.2 Delay Distribution in M/M/1 System and Arriving Customer's Distribution 


Let N, denote the number of customers found in the system by a customer arrival. We 
call P[N, = k] the arriving customer’s distribution. We now show that if arrivals are 
Poisson and independent of the system state and customer service times, then the 
arriving customer’s distribution is equal to the steady state distribution for the number in 
the system. A customer that arrives at time £ + 6 finds k in the system if N(t) = k, thus 


PLNQ(t) = k] = limP[N(t) = k| A(t + 8) — A(t) = 1] 


Eo P[A(t + ô) — A(t) 


_ P[A(t + 8) — A(t) = 1| N(t) = KJPLN(t) = k] 
ee P[A(t + 6) — A(t) = 1] i 


P[N(t) = k, A(t + 8) — A(t) = 1] 
= 1] 


where we have used the definition of conditional probability. The probability of an ar- 
rival in the interval (t,t + 6] is independent of N(f), thus 


P[A(t + 8) — A(t) = 1]P[N(t) = k] 
1 


P[N,(t) = k] = lim i 


50 P| A(t + 6) — A(t) = 
= P[N(t) = k]. 


Thus the probability that N, = k is simply the proportion of time during which the sys- 
tem has k customers in the system. For the M/M/1 queueing system under considera- 
tion we have 


PLN, = k] = P[N(t) = k] = (1 - p). (12.30) 


We are now ready to compute the distribution for the total time T that a cus- 
tomer spends in an M/M/1 system. Suppose that an arriving customer finds k in the sys- 
tem, that is, N, = k. If the service discipline is “first come, first served,” then T is the 
residual service time of the customer found in service, the service times of the k — 1 
customers found in queue, and the service time of the arriving customer. The memory- 
less property of the exponential service time implies that the residual service time of 
the customer found in service has the same distribution as a full service time. Thus T is 
the sum of k + 1 iid exponential random variables. In Example 7.5 we saw that this 
sum has the gamma pdf 

k 
x 
faln = ky = 

The pdf of T is found by averaging over the probability of an arriving customer 
finding k messages in the system, P[ N, = k]. Thus the pdf of T is 


pet® x>0, (12.31) 
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x (upx)* 
=(1-— —ux 
(1 — p)ue > A 
= (1 — p)ue "*er’™* 
= (u= Ajet x>0. 12.32 
(u= A) 


Thus T is an exponential random variable with mean 1/(j — A). Note that this is in 
agreement with Eq. (12.26) for the mean of T obtained through Little’s formula. 
We can similarly show that the pdf for the waiting time is 


fw(x) = (1 — p)8(x) + AQ. — pje x>. (12.33) 


Example 12.5 
Find the 95% percentile of the total delay. 
The pth percentile of T is that value of x for which 


p = P[T = x] 


x 
= | (u — A)je Hy dy=1- g PNE 
0 


which yields 
1 1 
x= In 
=à 1-p 


E[T]In(1 — p). (12.34) 


The 95% percentile is obtained by substituting p = .95 above. The result is x = 3.0 E[T]. 


The M/M/1 System with Finite Capacity 


Real systems can only accommodate a finite number of customers, but the assumption 
of infinite capacity is convenient when the probability of having a full system is negligi- 
ble. Consider the M/M/1/K queueing system that is identical to the M/M/1 system with 
the exception that it can only hold a maximum of K customers in the system. Customers 
that arrive when the system is full are turned away. 

The process M(t) for this system is a continuous-time Markov chain that 
takes on values from the set {0,1,..., K} with transition rate diagram as shown in 
Fig. 12.7. It can be seen that the arrival rate into the system is now zero when 
N(t) = K. The transition rates from the other states are the same as for the M/M/1 
system. 


À À À 
u u m 
FIGURE 12.7 


Transition rate diagram for M/M/1/K system. 
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The global balance equations are now 


Apo = Hpi 
(A + u)pj = ÀPj-1 + MPj41.  j=1,2,...,K -1 
UPK = ÀPKx-1- (12.35) 


Let p = A/w. It can be readily shown (see Problem 12.14) that the steady state probabili- 
ties are 
(1 — p)p! 


i ot 


for p < 1 orp > 1. When p = 1 all the states are equiprobable. Figure 12.8 shows the 
steady state probabilities for various values of p. 
The mean number of customers in the system is given by 


PIN = j] = j=0,1,2,..., K (12.36) 


K 
E[N] = DJPIN(H) = j] 
= 
K +1)p*™ 
P ( Le forp #1 
1-p L= p 
=i k (12.37) 
r forp = 1. 


2 


The mean total time spent by customers in the system is found from Eq. (12.37) by 
using Little’s formula with à,, the rate of arrivals that actually enter the system. The 
proportion of time when the system turns away customers is P[ N(t) = K] = px.Thus 
the system turns away customers at the rate 


Ap = Ape, (12.38) 
PIN = k] p<i 

lir, o t? 

0 1 2 K-1K 
PIN = k = — =] 

We kri P 

ee e 

0 1 2 K-1K 

PIN = K] ped 

t i | 2 | =k 

Ot 2 K-1K 
FIGURE 12.8 


Typical pmf’s for N(t) of M/M/1/K system. 
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(a) Carried load versus offered load for M/M/1/K system with K = 2 and K = 10. (b) Mean customer delay versus offered load 
in M/M/1/K system with K = 2 and K = 10. 


and the actual arrival rate into the system is 
Ag = ACL — pr). (12.39) 
Applying Little’s formula to Eq. (12.37) we obtain 
E[N] E[N] 
de A pR) 


In finite-capacity systems, it is necessary to distinguish between the traffic load 
offered to a system and the actual load carried by the system. The offered load, or 
traffic intensity, is a measure of the demand made on the system and is defined as 


E[T] = (12.40) 


customers seconds of service 
——— x Eft] (12.41) 
second customer 
The carried load is the actual demand met by the system: 
customers seconds of service 
—— x E[7] ; (12.42) 
second customer 


Example 12.6 Mean Delay and Carried Load Versus K 


Figure 12.9(a) gives a comparison of the carried load versus the offered load p for two values of 
K. It can be seen that increasing the capacity K results in an increase in carried load since more 
customers are allowed into the system. Figure 12.9(b) gives the corresponding values for the 
mean delay. We see that increasing K results in increased delays, again because more customers 
are allowed into the system. 


Example 12.7 


Suppose that an M/M/1 model is used for a system that has capacity K, and that the probability 
of rejecting customers is approximated by P[N = K]. Compare this approximation to the exact 
probability given by the M/M/1/K model. 
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For the M/M/1 system the above probability is given by 
P[N = K] = (1 - p)p* 
For p < 1, the probability of rejecting a customer in the M/M/1/K system is 


(1 = p)p* 
{-= px*! 


For p < 1 and K large, P[N = k] = P[N' = K]. For p > 1, the M/M/1 approximation breaks 
down and gives a negative probability. 


P[N' = K] = 


= (Ppp (tape + (PYP +} 


12.4 | MULTI SERVER SYSTEMS: M/M/c, M/M/c/c, AND M/M/20 


We now modify the M/M/1 system to consider queueing systems with multiple servers. 
In particular, we consider systems with iid exponential interarrival times and iid expo- 
nential service times. As in the case of the M/M/1 system, the resulting systems can be 
modeled by continuous-time Markov chains. 


12.4.1 Distribution of Number in the M/M/c System 


The transition rate diagram for an M/M/c system is shown in Fig. 12.10. As before, ar- 
rivals occur at a rate A. The difference now is that the departure rate is ku when k 
servers are busy. To see why, suppose that k of the servers are busy, then the time until 
the next departure is given by 


X = min(71,72,.--, Tk) 


where 7; are iid exponential random variables with parameter u. The complementary 
cdf of this random variable is 


P[X > t] = Plmin(7,72,..., Tk) > t] 
= Pit, > t, T2 > t,..., Tk >t] 


= Pit, > t]P[t2 > t]... P[Tk > t] 


= pM, e" 


=e hm, (12.43) 
À 
OmoN: EEO I ET WEE 
H 2u (c— 1) 
FIGURE 12.10 


Transition rate diagram for M/M/c system. 
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Thus the time until the next departure is an exponential random variable with mean 
1/ku. So when k servers are busy, customers depart at rate ku. When the number of cus- 
tomers in the system is greater than c, all c servers are busy and the departure rate is cy. 

We obtain the steady state probabilities for the M/M/c system from the general 
solution for birth-and-death processes found in Example 11.40. The probabilities of the 
first c states are obtained from the following recursion (see Eq. 11.45): 


À i 
Pj © ~ Pj-1 J= Lyze, 
JH 
which leads to 
al ; 
pj = F” j=0,1,...,¢, (12.44) 
where 
À 
a=-. (12.45) 
H 


The probabilities for states equal to or greater than c are obtained from the following 
recursion: 


À ; 
PpS a jJ=eocti1,ct+2,..., 
which leads to 
p= p! “De j=cct+1,c+ 2,... (12.46a) 
TEAC 
=" p, (12.46b) 
c! 


where we have used Eq. (12.44) with j = c and where 


p = —. (12.47) 
cu 
Finally po is obtained from the normalization condition: 


oo col al ac OOF y 
1= DS PSP yt ee 
j=0 j=0 J: C: j=c 
The system is stable and has a steady state if the term inside the brackets is finite. This is 
the case if the second series converges, which in turn requires that p < 1, or equivalently, 


À < cp. (12.48) 


In other words, the system is stable if the customer arrival rate is less than the total rate 
at which the c servers can process customers. The final form for pọ is 


c—1 „j c =k 
n= {3 pl ) . (12.49) 


mj! c!l-p 
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The probability that an arriving customer finds all servers busy and has to wait in 
queue is an important parameter of the M/M/c system: 


P[W > 0] = P[N=c] = Žo cp, = < (12.50) 
-p 


This probability is called the Erlang C formula and is denoted by C(c, a): 


C(c,a) =; = = PIW > 0]. (12.51) 


The mean number of customers in queue is given by 


EIN] = XU - oe! p. = rie p” 
á 


ssk y 
(bep 
—_ P 
= ——C(c, a). (12.52) 
Lop 
The mean waiting time is found from Little’s formula: 
E(Na] 
E = 
[w]=— 
l/u 
= —C(c, a). 12.53 
ope (12.53) 
The mean total time in the system is 
1 
E[T] = E[W] + E[t] = E[W] + —. (12.54) 
H 
Finally, the mean number in the system is found from Little’s formula: 
E[N] = AE[T] = E[N,] + a, (12.55) 


where we have used Equation (12.54). 


Example 12.8 


A company has two 1 Megabit/second lines connecting two of its sites. Suppose that packets for 
these lines arrive according to a Poisson process at a rate of 150 packets per second, and that 
packets are exponentially distributed with mean 10 kbits. When both lines are busy, the system 
queues the packets and transmits them on the first available line. Find the probability that a 
packet has to wait in queue. 

First we need to compute pọ. The system parameters are c = 2,A = 150 packets/sec, 
1/u = 10 kbit/1 Mbit/s = 10 ms, a = A/w = 1.5 and p = A/cy = 3/4. Therefore: 


(1.5 1 \# 1 
a isis 2 1734|) T 
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The probability of having to wait is then 
(1.5 1 _ 9 


C(2,1.5) = 


Example 12.9 M/M/1 Versus M/M/c 


Compare the mean delay and mean waiting time performance of the two systems shown in Fig. 12.11. 
Note that both systems have the same processing rate. 
For the M/M/1 system, p = A/w = (1/2)/1 = 1/2, so the mean waiting time is 


E[W] = fe = 15, 
and the mean total delay is 
1 
EIT] =- ee 
1—p 
For the M/M/2 system, a = A/w' = 1, and p = A/2y' = 1/2. The probability of an empty 
system is 
= 414 , a°/2 7 1 
po "eT T 3 
The Erlang C formula is 
2 
a’/2 1 
C(2,1) = => 
(2, 1) 1 g 3 
System 1: M/M/1 psi 
PAE a gs CN 
2 
à HLT e 
System 2: M/M/2 
1 
H => 
CN 
i 
sl e 
seam 
CA 
LY 
pod 
_ 
FIGURE 12.11 


M/M/1 and M/M/2 systems with the same arrival rate 
and the same maximum processing rate. 


12.4.2 
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The mean waiting time is then 


1/p' 
E w' C 2, 1) = >, 
Miaa an 
and the mean delay is 
ee ee 
3 p 3 


Thus the M/M/1 system has a smaller total delay but a larger waiting time than the M/M/2. In gen- 
eral, increasing the number of servers decreases the waiting time but increases the total delay. 


Waiting Time Distribution for M/M/c 


Before we compute the pdf of the waiting time, consider the conditional probability that 
there are j — c > 0 customers in queue given that all servers are busy (i.e., N(t) = c): 


PIN(t) =j, N(t) =c] _ PIN(t) = j] 
P[N(t) =c] P(N(t) 


P[N(t) = j| N(t) = c] = 


V 
S 


o PP pe aa 

aaa I S 

This geometric pmf suggests that when all the servers are busy, the M/M/c system 
behaves like an M/M/1 system. We use this fact to compute the cdf of W. 

Suppose that a customer arrives when there are k customers in queue. There must 

be k + 1 service completions before our customer enters service. From Eq. (12.43), 

each service completion is exponentially distributed with rate cu. Thus the waiting time 

for our customer is the sum of k + 1 iid exponential random variables with parameter 

cu, Which we know is a gamma random variable with parameter cu: 


(cux)* 
k 


(12.56) 


G 


fw(x|N=c+k)= 


we, (12.57) 


The cdf for W given that W > 0, or equivalently N = c, is obtained by combin- 
ing Eqs. (12.56) and (12.57): 


Fy(x|W > 0) = SFw(x|IN=ct+k)P[IN=ct+k|N=c] 
k=0 


L fuy 
= > yy CPE dy(1 — p) 


(cuy)* 
k 


p*cue dy 


-ol X 
0 k=0 


x: 
(ie peu f eee dy 
0 


= 1 = er, 
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The cdf of W is then 

P{W = x] = P[W = 0] + Fy(x|W > 0)P[W > 0] 
(1 — C(c,a)) + (1 — & #P)*)C(c, a) 
1 — C(c, a)e HO P)®, 


x >0 


(12.58) 


Since T = W + 7, where W and 7 are independent random variables, it is easy to show 
that ifa # c — 1, the cdf of T is 


a— c+ PW = 0] 


c-l-a 


C(c, a) 


c-l-a 


eee 4 ech p)x. 


(12.59) 


Example 12.10 


What is the probability that a packet has to wait more than one minute in the system discussed 
in Example 12.8? 
In Example 12.8 we found that pọ = 1/7 and that the probability of having to wait is 


9 
2,1.5) =—. 
C(2,15) = 5 
The probability of having to wait more than one minute is 
PW >1]=1- P[W = 1] 
9 
= —cu(1—p)1 — 7 ,-200(1/4)(0.040) 
C(c, a)e 14° 
ee = 
ae = 0.3045 


The M/M/c/c Queueing System 


The M/M/c/c queueing system has c servers but no waiting room. Customers that arrive 
when all servers are busy are turned away. The transition rate diagram for this system 
is shown in Fig. 12.12, where it can be seen that the arrival rate is zero when N(t) = c. 

The steady state probabilities for this system have the same form as those for 
states 0,...,c in the M/M/c system: 


Pj = Po J =0,...,C, (12.60) 
where 
T (12.61) 
H 
à À À À 
EE ees 
o u o 2u o (c- Dp ch o 
FIGURE 12.12 


Transition rate diagram for M/M/c/c system. 
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is the offered load and 


c al =i, 
Po = iso (12.62) 
j=0 J: 


The Erlang B formula is defined as the probability that all servers are busy: 


a‘/c! 


Bic, a Pi|N=c 2 are 12.63 
eo =P 1l+ata*/2)t+---+ a/c! ( ) 
The actual arrival rate into the system is then 
àa = A(1 — Bic, a)). (12.64) 
The average number in the system is obtained from Little’s formula: 
À 
E[N] = E[t] = —(1 — Bic, a)). (12.65) 
H 


Note that E[N] is also equal to the carried load as defined by Eq. (12.42). 

The Erlang B formula depends only on the arrival rate A, the mean service time 
E[t] = 1/p, and the number of servers c. It turns out that Eq. (12.63) also gives the 
probability of blocking for M/G/c/c systems (see Ross, 1983). 


Example 12.11 


A company has five 1 Megabit per second lines to carry videoconferences between two compa- 
ny sites. Suppose that each videoconference requires 1 Mbps and lasts for an average of 1 hour. 
Assume that requests for videoconferences arrive according to a Poisson process with rate 3 calls 
per hour. Find the probability that a call request is blocked due to lack of lines. 

The offered load is a = A/w = 3 calls/hr X 1 hr/call = 3. The blocking probability is then: 


35/5! 
B(5,3) = = 0.11. 
(5a) 1 +3 + 9/2 + 27/6 + 81/24 + 243/120 


The M/M/œ Queueing System 


Consider a system with Poisson arrivals and exponential service times, and suppose 
that the number of servers is so large that arriving customers always find a server 
available. In effect we have a system with an infinite number of servers. If we allow c 
to approach infinity for the M/M/c/c system, we obtain the M/M/oo system with the 
transition rate diagram shown in Fig. 12.13. 


À À À 
Omomor 
H 2j G+ le 


FIGURE 12.13 
Transition rate diagram for M/M/oo system. 
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The steady state probabilities are also found by letting c approach infinity in the 
equations for the M/M/c/c system: 


p= Pia j =0,1,2,..., (12.66) 


where a = A/w. Thus the number of customers in the system is a Poisson random 
variable. The mean number of customers in the system is 


E[N] =a. 


Example 12.12 


Subscribers connect to a university’s online catalog at a rate of 4 subscribers per minute. Sessions 
have an average duration of 5 minutes. Find the probability that there are more than 25 users online. 
The offered load is a = A/u = 4subscribers/minute X 5 minutes/subscriber = 20. The 
pmf for the number of users connected is a Poisson random variable with mean 20. The proba- 
bility that there are more than 25 in the system is: 
25 25/ 
P[N > 25] = 1 —- X, e” = 0.888 


f= i! 


where we used the Octave function poisson_cdf (25,20) . 


FINITE-SOURCE QUEUEING SYSTEMS 


Consider a single-server queueing system that serves K sources as shown in Fig. 12.14(a). 
Each source can be in one of two states: In the first state, the source is preparing a 
request for service from the server; in the second state, the source has generated a re- 
quest that is either waiting in queue or being served. For example, the sources could 
represent K machines and the server could represent a repairman who repairs machines 
when they break down. In another example, the K sources could represent clients that 
generate queries for a Web server as shown in Fig. 12.14(b). 


Client 1 > 


Client 2 


: -| | | | G > . Web server 
DoS Client K 


(a) (b) 


FIGURE 12.14 
(a) A finite-source single-server system. (b) A multi-user computer system. 


Section 12.5 Finite-Source Queueing Systems 735 


Ka (K-l)a Qa 
H H H 
FIGURE 12.15 


Transition rate diagram for a finite-source single-server system. 


Let M(t) be the number of requests in the system. We assume that each source 
spends an exponentially distributed amount of time with mean 1/a preparing each ser- 
vice request. Thus when idle, a source generates a request for service in the interval 
(t,t + 6) with probability a6 + o0(6). If the state of the system is N(t) = k, then the 
number of idle sources is K — k, so the rate at which service requests are generated is 
(K — k)a. We also assume that the time required to service each request is an expo- 
nentially distributed amount of time with mean 1/. N(t) is then the continuous-time 
Markov chain with the transition rate diagram shown in Fig. 12.15. 

The steady state probabilities are found using the results obtained in Example 
11.40: 


K! R 
Pe = (2) Dw k=0,1,...,K, (12.67) 
H 


K K! w k)-1 
Po = Sa E ae) ) i (12.68) 


We first compute the mean arrival rate A and the mean delay E[T] indirectly. In 
the last part of the section we show how they can be calculated directly. The server uti- 
lization p is the proportion of time when the system is busy, thus 


p=1- Po, (12.69) 


where 


where pois given by Eq. (12.68). The mean arrival rate to the queue can then be found 
from Little’s formula with “system” defined as the server: 


AE[7T] =p =1- po, 
which implies 


k= HIF = up = w(1 — po). (12.70) 


A source takes an average time of 1/a to generate a request and then spends time 
E[T] having it serviced in the queueing system. Thus each source generates a request at 
the rate (1/a + E[T])"! requests per second. Since the actual arrival rate must equal 
the rate at which the K sources generate requests, we have 


K 
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The mean delay in the system for each request is found by solving for E[ 7]: 
E|T| =—--—. (12.72) 


Finally, we can apply Little’s formula to Eq. (12.72) to obtain the mean number in 
the system: 


E[N] = AE[T] SKST; (12.73) 


Note that this implies that A/a is the mean number of idle sources. The mean waiting 
time is obtained by subtracting the mean service time from E[T7]: 


E[W] = E[T] - =. (12.74) 


The proportion of time that a source spends waiting for the completion of a service re- 
quest is the ratio of the time spent in the system to the mean cycle time: 


E[T] 


P| source busy] = EIT] + a 


(12.75) 


Example 12.13 Web Server System 


Some Web server designs place a limit K on the number of clients that can interact with it at any 
given time. The set of K clients generate queries to the Web server as follows. Each client spends an 
exponentially distributed “think” time preparing a transaction request, and the server takes an ex- 
ponentially distributed time processing each request. The “throughput” of the server is defined as 
the rate at which it completes transactions. The response time is the total time a transaction spends 
in the server. Find expressions for the throughput and response time for two extreme cases: K small 
and K large. 
When K is sufficiently small, there is no waiting in queue, so 


1 
E[T] = — for K small, (12.76) 
H 
and by Eq. (12.71), 
K 


A= Va + Ie for K small. (12.77) 


Thus A grows linearly with K. As K increases, the server eventually becomes fully utilized, and 
then answers queries at its maximum rate, namely u transactions per second. Thus 


A~p for K large, (12.78) 
and Eq. (12.72) becomes 
K 1 
E[T] =—-— for K large. (12.79) 
H a 


EIT] 
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FIGURE 12.16 
Delay and throughput for finite-source system as a function of number of sources. Dashed lines show small-K and large-K 
asymptotes. 


12.5.1 


These asymptotic expressions for the throughput and response time are shown in Fig. 12.16(a) and 
(b). The value of K where the two asymptotes for E[T] intersect is called the system saturation point, 


_ et Va 
ae 


When K becomes larger than K*, the queries from different clients are certain to interfere with 
one another and the response time increases accordingly. 


K* (12.80) 


*Arriving Customer's Distribution 


In the above discussion, we found A, E[N], and E[7] in a roundabout way (see Eqs. 
12.70, 12.71, and 12.72). To calculate E[T] directly, we argue as follows. If we assume a 
first-come, first-served service discipline, then a customer who arrives when there are 
N, = k requests in the queueing system spends a total time in the system equal to the 
sum of 1 residual service time, k — 1 service times, and the customer’s own service 
time. Since all of these times are iid exponential random variables with mean 1/w, the 
mean time in the system for our request is 


k+1 
E[T|N, = k] = 


The mean time in the system is then found by averaging over N,,: 


1 K-1 
E[T] ==), (k + 1)P[N, = k]. (12.81) 
MK=0 
The difficulty with the above equation is that arrivals are not Poisson—remember 
that the arrival rate is (K — N(t))a, and thus depends on the state of the system. Con- 
sequently, the distribution of states seen by an arriving customer is not the same as 
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P(N = k], the proportion of time that there are k requests in the queueing system. For 
example, a service request cannot be generated when all sources have requests in the 
system, that is, V(t) = K, so P[N, = K] = 0. However, P| N = K] is nonzero since it 
is possible for all sources to have requests in the queueing system simultaneously. 

To find P[ N, = k] we need to find the long-term proportion of time that arriving 
customers find k customers in the system. Since pp = P[N(t) = k] is the long-term 
proportion of time the system is in state k, then in a very long time interval of duration 
T’ approximately p,T’ seconds are spent in state k. The arrival rate when N(t) = k is 
(K — k)a requests per second, so the number of arrivals that find k requests is ap- 
proximately 


(K — k)acustomers/second X pT’ seconds in state k. (12.82) 


The total number of arrivals in time T’ is obtained by summing over all states: 


K 
a (K — j)ap,T’. (12.83) 
= 
Thus the proportion of arrivals that find k requests in the system is 
(K — k)ap,T"’ (K — k) px 
PIN, = k] = ~œ S 
DK -japT' XK - Ap; 
j=0 j=0 


(K — K)[K!/(K — k)!](o/u)*po 
K 

DK =- DIK'/(K =- j)!N(/H)'po 
[(K - 1)!/(K -= k - 1)!](a/m)* 


) Hiss = 1)Y/(K =- j — 1)!](a/my 
= 


0<k<K-1. (12.84) 


If we compare Eq. (12.84) with Eq. (12.67), we see that Eq. (12.84) is the steady state 
probability of having k customers in a system with K — 1 sources. In other words, a 
source when placing a request “sees” a queueing system that behaves as if the source 
were not present at all! 

We leave it up to you in Problem 12.37 to show that Eqs. (12.84) and (12.81) give 
E|T]as given in Eq. (12.72). Indeed, this same approach can be used to find the pdf of T. 


M/G/1 QUEUEING SYSTEMS 


We now consider single-server queueing systems in which the arrivals follow a Poisson 
process but in which the service times need not be exponentially distributed. We as- 
sume that the service times are independent, identically distributed random variables 
with general pdf f,(x). The resulting queueing system is denoted by M/G/1. 

The number of customers M(t) in an M/G/1 system is a continuous-time random 
process. Recall that the “state” of the system is the information about the past history 


12.6.1 
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FIGURE 12.17 
Sequence of service times and a residual service time. 


of the system that is relevant to the probabilities of future events. In the preceding sec- 
tions, customer interarrival times and service times were exponential distributions, so 
N(t) was always the state of the system. This is no longer the case for M/G/1 systems. 
For example, if service times are constant, then knowledge about when a customer 
began service specifies the customer’s future departure time. Thus the state of an 
M/G/1 system at time t is specified by N(t) together with the remaining (“residual”) 
service time of the customer being served at time t. 

In this section we present a simple approach based on Little’s formula that gives 
the mean waiting time and mean delay in an M/G/1 system. We also use this simple ap- 
proach to find the mean waiting times in M/G/1 systems that have priority classes. 


The Residual Service Time 


Suppose that an arriving customer finds the server busy, and consider the residual time 
of the customer found in service. Let 7,, 72,... be the iid sequence of service times of 
customers in this M/G/1 system, and suppose we divide the positive time axis into seg- 
ments of length 7,,72,... as shown in Fig. 12.17. We can then view customers who ar- 
rive when the server is busy as picking a point at random on this time axis. The residual 
service time is then the remainder of time in the segment that is intercepted as shown 
in Fig. 12.17. 

In Example 7.21 we showed that the long-term proportion of time that the resid- 
ual service time exceeds x is given by 


1 (0.0) 
al C- EO (12.85) 


Since the arrival times of Poisson customers are independent of the system state, 
Eq. (12.85) is also the probability that the residual service time R of a customer found 
in service exceeds x, that is, 


P[R > x] = al (1 — Fi(y)) dy. (12.86) 
The pdf of R is then 
d - F(x) 
fr(x) = -PIR > x] = Fr] (12.87) 
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Integrating by parts with u = (1 — F,(x))/E[7] and dv = x dx, we obtain 


Pedal hie a 
EIR] = (1- RG) sera), + JEA j Pp(x) dx 
E[T’] 
= 2Eļ[r]" (12.88) 


Example 12.14 


Compare the residual service times of two systems with exponential service times of mean m and 
constant service times of mean m, respectively. 

For an exponential service time of mean m, the second moment is 2m?, thus the mean 
residual service time is, from Eq. (12.88), 


2m? 
E| Rexp] = 


2m 
Thus the mean residual time is the same as the full service time of a customer. This is consistent 
with the memoryless property of the exponential random variable. 

The second moment of a constant random variable of value m is m?. Thus the mean resid- 


ual service time is 


m m 


E[ Reonst] m 2m T 7’ 


which is what one would expect; on the average we expect to wait half a service time. 


Mean Delay in M/G/1 Systems 


Consider the time W spent by a customer waiting for service in an M/G/1 system. If the 
service discipline is first come, first served, then W is the sum of the residual service 
time R' of the customer (if any) found in service and the N,(t) = k — 1 service times 
of the customers (if any) found in queue. Thus the mean waiting time is then 


E[W] = E[R'] + E[N,(t)JE[7], (12.89) 
since the service times are iid with mean E[7] (see Eq. 7.13). From Little’s formula we 
have that E[N,(t)] = AE[W], so 

E[W] = E[R'] + AE[W]E|t] = E[R'] + pE[W]. (12.90) 
The residual service time R’ encountered by an arriving customer is zero when 


the system is found empty, and R, as defined in the previous section, when a customer 
is found in service. Thus 


E[R'] = OP[N(t) = 0] + E[R](1 — PI N(t) = 0]) 


- El] ije 


= (12.91) 
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where we have used Eq. (12.88) for E[R] and Eq. (12.14) for the fact that 
1 — P[N(t) = 0] = p = AE[ 7}. 

The mean waiting time E[W] of a customer in an M/G/1 system is found by sub- 
stituting Eq. (12.91) into Eq. (12.90) and solving for E[W]: 


2 
ew) - 22nd (12.92) 
2(1 — p) 


We can obtain another expression for E[W] by noting that E[7?] = o2 + E[t}: 


Ao? + EPD +) 
BMS eT aa aig) 
eh +C?) 
Ae (12.93) 


where C2 = o2/E[r] is the coefficient of variation of the service time. Equation 
(12.93) is called the Pollaczek-Khinchin mean value formula. 
The mean delay E[T] is found by adding the mean service time to E[W]: 


p(1 + C3) 
2(1 = p) ` 

From Eqs. (12.93) and (12.94) we can see that the mean waiting time and mean delay 

time are affected not only by the mean service time and the server utilization but also 


by the coefficient of variation of the service time. Thus the degree of randomness of the 
service times as measured by C? affects these delays.! 


E(T] = E[r] + E[r] (12.94) 


Example 12.15 


Compare E[W] for the M/M/1 and M/D/1 systems. The second moments of the exponential and 
constant random variables were found in Example 12.14. The exponential service time has a co- 
efficient of variation equal to one. Thus Eq. (12.93) implies 


E[Wumn] = E[r]. (12.95) 


P 
(1 — p) 
The constant service time has zero variance, so its coefficient of variation is zero. Thus 


pa cee 
2(1 — p) 


Thus we see that the waiting time in an M/D/1 is half that in an M/M/1 system. 


E[Wwpn] = E[r]. (12.96) 


'On the other hand, it is rather surprising that only the first two moments of the distribution of the service 
time affect E[W] and E[T]. 
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12.6.3 Mean Delay in M/G/1 Systems with Priority Service Discipline 


Consider a queueing system that handles K priority classes of customers. Type k cus- 
tomers arrive according to a Poisson process of rate A, and have service times with 
pdf f,,(x) and mean E[7,]. A separate queue is kept for each priority class, and each 
time the server becomes available it selects the next customer from the highest-pri- 
ority nonempty queue. This service discipline is often referred to as “head-of-line 
priority service.” We assume that customers cannot be preempted once their service 
has begun. 
The server utilization from type k customers is 


Pe = AkE[ Tx]. 
We assume that the total server utilization is less than 1: 
p=ppt-:t+pr<i. (12.97) 


If this is not the case, one or more of the lower-priority queues become unstable, that is, 
grow without bound. 

Consider the mean waiting time W; of the highest-priority (type 1) customer. If 
an arriving type 1 customer finds N,,(t) = kı type 1 customers in queue and if the ser- 
vice discipline is first come, first served within each class, then W; is the sum of the 
residual service time R” of the customer (if any) found in service and the N,,(t) = kı 
service times of the type 1 customers (if any) found in queue. Thus 


E(W,] = E[R"] + ELN,,JE[71}- 


Following the same development that followed Eq. (12.89) in the previous section, we 
arrive at the following expression for the mean waiting time for type 1 customers: 


E[W,] = . (12.98) 


If an arriving type 2 customer finds N}, (t) = kı type 1 and N}, (t) = kn type 2 
customers waiting in queue, then W, is the sum of the residual service time R” of the 
customer (if any) found in service, the k, service times of the type 1 customers (if any) 
found in queue, the service times of the k, type 2 customers found in queue, and the 
service times of the higher-priority type 1 customers who arrive while our customer is 
waiting in queue. Thus 


E[W,] = E[R"] + E[N,,JE[71] + E[N ]ElT2] + ELMJE[71], (12.99) 
where M, denotes the number of type 1 arrivals during our customer’s waiting time. By 
Little’s formula we have E|N,,] = AyE[W,] and E[N,,] = ME[W2]. In addition, the 


mean number of type 1 arrivals during E[ W3] seconds is E| M,] = A,E[W,]. Substitut- 
ing these expressions in Eq. (12.99) gives 


E| W] = E[R"] + py E[W,] + poE[W 2] + p E[W]. 
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Solving for E[ W2], 
E[R"] + piE[Wi] 
rw) = = 
Pı ~ P2 
E| R"] 


Ie E re 


where we have used Eq. (12.98) for E[ W1]. 
If there are more than two classes of customers, the above method can be used to 
show that the mean waiting time for a type k customer is 


E[R"] 


N E E A S 


(12.101) 


The customer found in service by an arriving customer can be of any type, so R” 
is the residual service time of customers of all types: 


AE[7"] 
E[R"] = E (12.102) 
where A is the total arrival rate, 


and E[7?] is the second moment of the service time of customers of all types. The frac- 
tion of customers who are type k is A/A, thus 


À À 
E[r] = J Elri] AE aek EDk] (12.104) 


We finally arrive at the following expression for the mean waiting time for type k 
customers: 


K 
Sael 
EMS Depo Peal = peas Pk) oe 
The mean delay for type k customers is then 
E| Tk] = E[W,] + E[r]. (12.106) 


Equation (12.105) reveals the effect of the priority classes on one another. Class 
k customers are affected by lower-priority customers only through the residual-ser- 
vice-time term in the numerator. On the other hand, if the server utilization of the first 
k — 1 classes exceeds one, then the queue for class k customers is unstable. 
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Example 12.16 


A computer handles two types of jobs. Type 1 jobs require a constant service time of 1 ms, and 
type 2 jobs require an exponentially distributed amount of time with mean 10 ms. Find the mean 
waiting time if the system operates as follows: (1) an ordinary M/G/1 system and (2) a two-prior- 
ity M/G/1 system with priority given to type 1 jobs. Assume that the arrival rates of the two class- 
es are Poisson with the same rate. 

The first two moments of the service time are 


1 1 


E[t] = 5 Alt] 5 Flr] = 5.5 
E[7’] T EÇ] ae = xe + 2(107)) = 100.5. 


The traffic intensity for each class and the total traffic intensity are 


À À 
pi = 15> p2 = 107, and 
p = AE[r] = 5.5A, 


where A is the total arrival rate. The mean residual service time is then 


AE[ 77] 


E[R] = = 50.25A. 


From Eq. (12.92), the mean waiting time for an M/G/1 system is 


E[R] 50.25A 


= = ; 12.107 
1-p 1-5.5 ( ) 
For the priority system we have 
E[R] _  50.25A 
E = = 12.108 
WAS ge pı L054 ( ) 
and 
E|R 50.25A 
E[ W] = [R] = (12.109) 


(=a) p) (1—05a)(1 5.5) 


Comparison of Eqs. (12.108) and (12.109) with Eq. (12.107) shows that the waiting time of type 
1 customers is improved by a factor of (1 — p)/(1 — pı) and that of type 2 is worsened by the 
factor 1/(1 — pı). 

The overall mean waiting for the priority system is 


=F (Ws) = (ae (1 I 2 -) 
7 (AEE) 


_1-2.75A 
1-05 


E[W,] = 5£(Mi) + 


E[W], 
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FIGURE 12.18 

Relative mean waiting times in priority and nonpriority M/G/1 systems: E[W], mean waiting time in 
M/G/1 system; E[W;], E[W2], mean waiting time for type 1 and type 2 customers in priority 
system; E[ Wp], overall mean waiting time in priority system. 


where E[W] is the mean waiting time of the M/G/1 system without priorities. Figure 12.18 shows 
E| W], E[W,], E[W,], and E[ W3]. It can be seen that the discipline “short-job type first” used 
here improves the average waiting time. The graphs for E[W,] and E[W,] also show that at 
A = 2/11 the lower-priority queue becomes unstable but the higher-priority remains stable up to 
A= 2. 


M/G/1 ANALYSIS USING EMBEDDED MARKOV CHAINS 


In the previous section we noted that the state of an M/G/1 queueing system is given by 
the number of customers in the system M(t) and the residual service time of the cus- 
tomer in service. Suppose we observe M(t) at the instants when the residual service 
time becomes zero (i.e., at the instants D; when the jth service completion occurs); then 
all of the information relevant to the probability of future events is embodied in 
N; = N(D)), the number of customers left behind by the jth departing customer. We will 
show that the sequence N; is a discrete-time Markov chain and that the steady state 
pmf at customer departure instants is equal to the steady state pmf of the system at ar- 
bitrary time instants. Thus we can find the steady state pmf of M(t) if we can find the 
steady state pmf for the chain Nj. 


The Embedded Markov Chain 


First we show that the sequence N; = N(D,) is a Markov chain. Consider the relation 
between N; and Nj-;. If Nj-; = 1, then a customer enters service immediately at time 
Dj, as shown in Fig. 12.19(a), and N; equals Nj-;, minus the customer that is served in 
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FIGURE 12.19 
(a) Customer j — 1 leaves the system nonempty at time D;_;. (b) Customer j — 1 leaves the system empty at 
time Dj—1. 

Fi 


between, plus the number of customers M; that arrive during the service time of the jth 


customer: 
N=N1-1+M, ifNa=1. (12.110a) 


If Nj-; = 0, then as shown in Fig. 12.19(b), there are no departures until the jth cus- 
tomer arrives and completes his service; N; then is the number of customers who arrive 
during this service time: 


N=M,  ifN1=0. (12.110b) 


Thus we see that N; depends on the past only through Nj_; and M;. The M; form an iid 
sequence because the service times are iid and because of the memoryless property of 
Poisson arrivals. Thus N; depends on the past of the system only through N;-1. We 
therefore conclude that the sequence N; is a Markov chain. 

Next we need to show that the steady state pmf of N(t) is the same as the steady 
state pmf of N;. We do so in two steps: first, we show that in M/G/1 systems, the distrib- 
ution of customers found by arriving customers is the same as that left behind by de- 
parting customers; second, we show that in M/G/1 systems, the distribution of 
customers found by arriving customers is the same as the steady state distribution of 
N(t). It then follows that the steady state pmf’s of N(t) and N; are the same. 

First we need to show that for systems in which customers arrive one at a time and 
depart one at a time (i.e., M/G/1 systems) the distribution found by arriving customers is 
the same as that left behind by departing customers. Let U,(t) be the number of times 
the system goes from n ton + 1 in the interval (0, t); then U,,(t) is the number of times 
an arriving customer finds n customers in the system. Similarly, let V,,(¢) be the number 
of times that the system goes from n + 1 to n; then V,(t) is the number of times a de- 
parting customer leaves n. Note that the transition n to n + 1 cannot reoccur until 
after the number in the system drops to n once more (i.e., until after the transition 
n + 1 ton reoccurs). Thus U,(t) and V,,(t) can differ by at most 1. As t becomes large, 
both of these transitions occur a large number of times, so the rate of transitions from 
n to n + 1 equals the rate from n + 1 to n. Thus the rate at which customer arrivals 
find n in the system equals the rate at which departures leave n in the system. It then 
follows that the probability that an arrival finds n in the system is equal to the proba- 
bility that a departure leaves n behind. 
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Since the arrivals in an M/G/1 system are Poisson and independent of the cus- 
tomer service times, the customer arrival times are independent of the state of the sys- 
tem. Thus the probability that an arrival finds n customers in the system is equal to the 
proportion of time the system has n customers, that is, the steady state probability 
P[N(t) = n]. Thus the distribution of states seen by arriving customers is the same as the 
steady state distribution. 

By combining the results from the two previous paragraphs, we have that for an 
M/G/1 system, the pmf of Nj, the state at customer departure points, is the same as the 
steady state pmf of N(¢). In the next section, we find the generating function of N; and 
thus of N(t). 


The Number of Customers in an M/G/1 System 


We now find the generating function for the steady state pmf of Nj. The transition 
probabilities for N; can be deduced from Eqs. (12.110a) and (12.110b): 


Pix = PIN, =kIN-1=i)=P[Mj=k-i+1] i>0 (12111a) 
Pox = PIN; = k| Nj = 0] = PLM; = k]. (12.111b) 


Note that pj, = 0 for k — i + 1 < 0. The probability that there are N; = k customers 
in the system at time j is 


PIN, = 2,PI i) Pix 


0 
= PIN j-1 = O)P[M, = k] 
+ SPiN N; = iJP[M) =k + 1-3] (12.112a) 


SPIN a = 0]P[M; = k] 


+ SPIN = i]P[M;=k +1- i], (12.112b) 
i=l 
where we have used the fact that P| M; = k + 1 — i] = Ofori > k +1. 
If the process N; reaches a steady state as j —> œ, then P| Nj = k] > P[Ny = k] 


and the above egüafion becomes 


P(Na = k] = P[N, = 0]P[M = k] 
+ SPIN, = iJP[IM=k+1-i], (12.113) 
i=1 


where N; denotes the number of customers left behind by a departing customer. 

Since the steady state pmf of N; is equal to that of M(t), Eq. (12.113) also holds for 
the steady state pmf of M(t). Equation (12.113) is readily solved for the generating 
function of N(t) by using the probability generating function. The generating functions 
for N and for M are given by 
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Gy(z) = X PIN = k]z* and 
k=0 
We multiply both sides of Eq. (12.113) (with N, replaced by N) by z* and sum from 0 to 
infinity: 
SPIN = k]z* = > P[N = 0]P[M = k]z* 
=0 k=0 
iJP[M = k +1 -~ iļz*. (12.114) 


k=0i=1 
are immediately recognizable in the first two 


The generating functions for N and 


summations: 
P(N = 0]Gy(z) 
— ilz k+1- i 


i=1 
The first summation is the generating function for N with the i = 0 term missing. Let 
M = k'] = Ofork’ < 0, then 


k' = k + 1 — iin the second summation and note that P[ 
Gy(z) = PIN = 0]Gu(z) + z *{Gy(z) - H Spm = 
= P(N = 0]Gu(z) + z"(Gn(z) — P[N = 0])Gy(z). (12.115) 
The generating function for N is found by solving for Gy(z) 
PIN = 0](z — 1)Gu(z) 
G = 12.116 
n(Z) z- Gy(z) ( ) 
We can find P| N = 0] by noting that as z > 1, we must have 
k]z* 1. (12.117) 


= SPIN 


When we take the limit z > 1 in Eq. (12.116) we obtain zero for the numerator and the 


denominator. By applying L’Hopital’s rule, we obtain 
Gu(z) + (z = 1)Gu(2) PLN = 0] 
1=P[N =0 = : 12.118 
i eee) kea eee) 
Thus 
P(N = 0] =1- E[M] (12.119) 
and 
1 — E[M])(z - 1)G 
( [M])( )Gulz) (12.120) 
z — Gy(z) 


Gy(z) = 
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Note from Eq. (12.119) that we must have E[ M] < 1 since P[N = 0] = 0. This stabil- 
ity condition makes sense since it implies that on the average less than one customer 
should arrive during the time it takes to service a customer. 

We now determine Gy(z), the generating function for the number of arrivals 
during a service time: 


CO 


Gy(z) = SP[M = k]z* 


k=0 
= >) P[M = k|r = t]f,(t) dt z*. (12.121a) 
k=0/0 


Noting that the number of arrivals in t seconds is a Poisson random variable, 


oo co À k 
Gu(z) = >| a e™ f(t) dt z“ 


= k! 
j œ ev As 
= [ p(y! ) z* dt 
0 ko k! 
= [ ee (yer? dt 
0 
= | e-f (t) dt 
0 
= 7(A(1 — z)), (12.121b) 
where 7(s) is the Laplace transform of the pdf of 7: 
7(s) = [ e “f.(t) dt. (12.122) 
0 
We can obtain the moments of M by taking derivatives of Gy(z): 
d d 
E[M] = — = A(1 
Bie g Ctl = Gif aad — 2] 
= F'(A(1 = z))(—A) le 
= —Azr'(0) = AE[r] = p, (12.123) 


where we used the chain rule in the second equality. Similarly, 
E[M(M — 1)] = 4°7"(0) = A?E[ 77]. 
Thus 
oy = E[M?] — E[MF = Er] + AE[r] — (AE[7])? 
= Nor + AE[ TI. (12.124) 
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If we substitute Eqs. (12.123) and (12.121b) into Eq. (12.120), we obtain the 
Pollaczek-Khinchin transform equation, 
(be pice DAU -= z)) 
z — 7(A(1 — z)) 


Gy(z) = (12.125) 


Note that Gy(z) depends on the utilization p, the arrival rate A, and the Laplace trans- 
form of the service time pdf. 


Example 12.17 M/M/1 System 


Use the Pollaczek-Khinchin transform formula to find the pmf for N(t) for an M/M/1 system. 
The Laplace transform for the pdf of an exponential service of mean 1/u is 


stm 


Thus the Pollaczek—Khinchin transform formula is 


(1 = p)(z = 1)lw/(AC = z) + p) 
z— [a/a = z) +H) 
WSE Du _ 1-p 


(A-Aztp)z—-p 1-pz’ 


Gy(z) = 


where we canceled the z — 1 term from the numerator and denominator and noted that 
p = A/p. By expanding Gy(z) in a power series, we have 


Gy(z) = Sa p)pkzk = SPIN = kz, 
k=0 k=0 


which implies that the steady state pmf is 
PIN = k] = (1 -— p) k= 0,1,2,..., 


which is in agreement with our previous results for the M/M/1 system. 


Example 12.18 M/H2/1 System 


Find the pmf for the number of customers in an M/G/1 system that has arrivals of rate A and where 
the service times are hyperexponential random variables of degree two, as shown in Fig. 12.20. 
In other words, with probability 1/9 the service time is exponentially distributed with mean 1/4, 
and with probability 8/9 the service time is exponentially distributed with mean 1/2A. 

In order to find T(s) we note that the pdf of 7 is 


1 8 
f(x) = ae + ze x>0. 


Thus the mean service time is 
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FIGURE 12.20 


A hyperexponential service time results if 
we select an exponential service time of 
rate A with probability 1/9 and an 
exponential service time of rate 2A with 
probability 8/9. 


and the server utilization is p = AE[7] = 5/9. The Laplace transform of f,(x) is 


(8) 1 à 8 2a 18A? + 17As 
TSIT ISA 9S2 9(s + A)(s + 2A)’ 


Substitution of 7(A(1 — z)) into Eq. (12.125) gives 


(1 — p)(z — 1)(18A? + 17A7(1 — z)) 
Q(A — Az + A)(A — Az + 2A)z — (18A? + 17A7(1 — z)) 
(1 — p)(z — 1)(35 — 17z) 
= 9(2 = z)(3 = z)z = (35 — 172)’ 


where we have canceled A? from the numerator and denominator. If we factor the denominator 
we obtain 


(1 — p)(35 — 17z)(z - 1) 
9(z — 1)(z — 7/3)(z — 5/3) 


1⁄3 2/3 
(1 ol 37° T= =H, 


where we have carried out a partial fraction expansion. Finally we note that since Gy(z) con- 
verges for |z| < 1, we can expand Gy(z) as follows: 


Gy(z) = (1 DSE) H 25(2) a} 


Since the coefficient of z* is P[N = k], we finally have that 


4(3\- 8 /3\* 
pw ane A(2) +8 (3) eau... 


where we used the fact that p = 5/9. 


Gy(z) = 


752 


Chapter 12 Introduction to Queueing Theory 


12.7.3 Delay and Waiting Time Distribution in an M/G/1 System 


We now find the delay and waiting time distributions for an M/G/1 system with first- 
come, first-served service discipline. If a customer spends 7; seconds in the queueing 
system, then the number of customers N; it leaves behind in the system is the number 
of customers that arrive during these T seconds, since customers are served in order of 
arrival. An expression for the generating function for N; is found by proceeding as in 
Eq. (12.121a): 


Gy (Zz) = >) PIN; = k|T = t]fr(t) dt z* 


= T(A(1 — z)), (12.126) 


where T(s) is the Laplace transform of the pdf of T, the total delay in the system. Since 
the steady state distributions of N,(t) and N(¢) are equal, we have that Gy(z) = Gy,(z) 
and thus combining Eqs. (12.125) and (12.126): 


a (1 -= p)(z - DAC — 2) 


T(A(1 = Zz 12.127 
Al = 2) ENS (12.127) 
If we let s = A(1 — z), Eq. (12.127) yields an expression for T(s): 
2 1 — p)st(s 
T(s) = l p) ( ) ; (12.128) 
s— à +AT(s) 


The pdf of T is found from the inverse transform of T(s) either analytically or numeri- 
cally. 

Since T = W + 7, where W and 7 are independent random variables, we have 
that 


T(s) = W(s)7(s). (12.129) 
Equations (12.128) and (12.129) can then be solved for the Laplace transform of the 
waiting time pdf: 
x (1 — p)s 
wey = s—A+t At(s) 


Equations (12.128) and (12.130) are also referred to as the Pollaczek—-Khinchin trans- 
form equations. 


(12.130) 


Example 12.19 M/M/1 
Find the pdf’s of W and T for an M/M/1 system. Substituting 7(s) = u/(s + u) into Eq. (12.128) 
gives 
2 1- p)s p= 
#(s) = (1 — p)sp = -pu 
(s+ u)(s=Aà)+àu s=(à-uu) 


(12.131) 
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which is readily inverted to obtain 


fr(x) = wh — pe HP® x > 0. (12.132) 
Similarly, Eq. (9.130) gives 
š (1 — p)s stp 
W(s) = (1 -= p) 


s—A+Ap/(s + p) stu- 


In order to invert this expression, the numerator polynomial must have order lower than that of 
the denominator polynomial. We achieve this by dividing the denominator into the numerator: 


W(s) = LE SEN Sj fis À \ 12.133 
(s) =(1-p ares (1 — p) ren eee (12.133) 

We then obtain 
fw(x) = (1 — p)d(x) + ACL — pet x>0. (12.134) 


The delta function at zero corresponds to the fact that a customer has zero wait with probability 
(1 — p). Equations (12.132) and (12.134) were previously obtained as Eqs. (12.32) and (12.33) in 
Section 12.3 by a different method. 


Example 12.20 M/H32/1 


Find the pdf of the waiting time in the M/H3/1 system discussed in Example 12.18. 
Substitution of #(s) from Example 12.18 into Eq. (12.130) gives 


A 9s(1 — p)(s + A)(s + 2A) 
W(s) = OG — ays + AVS + 2A) + (IBA? + TAs) 
© (1 = p)(s + A)(s + 2a) 
s? + 2As + 87/9 


9s? + 27As + 18A? 
Pros? + 18As + 8X2 


Stam 9As + 10d? 
R 9s? + 18As + 8A? 


2A/3 A83 
(1 pfi s + 2A/3 ih 


where we have followed the same sequence of steps as in Example 12.18 and then done a partial 
fraction expansion. 
The inverse Laplace transform then yields 
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Examples 12.18 and 12.19 demonstrate that the Pollaczek—Khinchin transform 
equations can be used to obtain closed-form expressions for the pmf of M(t) and the 
pdf’s of W and T when the Laplace transform of the service time pdf is a rational func- 
tion of s, that is, a ratio of polynomials in s. This result is particularly important because 
it can be shown that the Laplace transform of any service time pdf can be approximat- 
ed arbitrarily closely by a rational function of s. Thus in principle we can obtain exact 
expressions for the pmf of N(t) and pdf’s of W and T. 

In addition it should be noted that the Pollaczek—Khinchin transform expressions 
can always be inverted numerically using fast Fourier transform methods such as those 
discussed in Section 7.6. This numerical approach does not require that the Laplace 
transform of the pdf be a rational function of s. 


BURKE'S THEOREM: DEPARTURES FROM M/M/c SYSTEMS 


In many problems, a customer requires service from several service stations before a 
task is completed. These problems require that we consider a network of queueing sys- 
tems. In such networks, the departures from some queues become the arrivals to other 
queues. This is the reason why we are interested in the statistical properties of the de- 
parture process from a queue. 

Consider two queues in tandem as shown in Fig. 12.21, where the departures from 
the first queue become the arrivals at the second queue. Assume that the arrivals to the 
first queue are Poisson with rate A and that the service time at queue 1 is exponential- 
ly distributed with rate u, > A. Assume that the service time in queue 2 is also expo- 
nentially distributed with rate u > À. 

The state of this system is specified by the number of customers in the two 
queues, (N,(t), N2(t)). This state vector forms a Markov process with the transition 
rate diagram shown in Fig. 12.22, and global balance equations: 


AP[N, = 0, Ny = 0] = woP[N, = 0, M = 1] (12.135a) 
(A + m)P[N =n, Ny = 0) = mP[N =n, N = 1] 
+ AP[N, =n-1,N)=0] n>0 (12.135b) 
(A + py) PLN, = 0,N, = m] = mP[N = 0,N, = m + 1] 
+ PIN, =1,N;=m—-1] m>0 (121350) 


(A + m + m)P[N = n, Ny = m) = poP[N, =n, N, = m + 1} 
+ pyP[N, =n +1,N)=m— 1] 
+ AP[N, =n-1, N =m] 
n>0,m>0. (12.135d) 
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FIGURE 12.21 
Two tandem exponential queues with Poisson input. 
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FIGURE 12.22 


Transition rate diagram for two tandem exponential queues with 
Poisson input. 


It is easy to verify that the following joint state pmf satisfies Eqs. (12.135a) 
through (12.135d): 


P[N, = n, Ny = m] = (1 — p) — p)” n= 0,m=0, (12.136) 


where p; = A/p;. We know that the first queue is an M/M/1 system, so 
PIN, = n] = (1 - py)pi n=0,1,.... (12.137) 
By summing Eq. (12.136) over all n, we obtain the marginal state pmf of the second queue: 
P[N2 = m] = (1 — p2)p3' m= 0. (12.138) 


Equations (12.136) through (12.138) imply that 


P[N, = n, Ny = m] = PLN, = n]P[ N = m] foralln,m. (12.139) 


In words, the number of customers at queue I and the number at queue 2 at the same time in- 
stant are independent random variables. Furthermore, the steady state pmf at the second 
queue is that of an M/M/1 system with Poisson arrival rate À and exponential service time m3. 

We say that a network of queues has a product-form solution when the joint pmf 
of the vector of numbers of customers at the various queues is equal to the product of 
the marginal pmf’s of the number in the individual queues. We now discuss Burke’s 
theorem, which states the fundamental result underlying the product-form solution in 
Eq. (12.139). 
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Burke's Theorem 
Consider an M/M/1, M/M/c, or M/M/co queueing system at steady state with arrival rate A, then 


1. The departure process is Poisson with rate A. 


2. At each time t, the number of customers in the system M(t) is independent of the se- 
quence of departure times prior to t. 


The product-form solution for the two tandem queues follows from Burke’s the- 
orem. Queue 1 is an M/M/1 queue, so from part 1 of the theorem the departures from 
queue 1 form a Poisson process. Thus the arrivals to queue 2 are a Poisson process, so 
the second queue is also an M/M/1 system with steady state pmf given by Eq. (12.138). 
It remains to show that the numbers of customers in the two queues at the same time 
instant are independent random variables. 

The arrivals to queue 2 prior to time t are the departures from queue 1 prior to 
time t. By part 2 of Burke’s theorem the departures from queue 1, and hence the ar- 
rivals to queue 2, prior to time t are independent of N,(t). Since N,(t) is determined by 
the sequence of arrivals from queue 1 prior to time ¢ and the independent sequence of 
service times, it then follows that N,(t) and M(t) are independent. Equation (12.139) 
then follows. Note that Burke’s theorem does not state that N,(t) and N,(t) are inde- 
pendent random processes. This would require that N,(t,) and N,(t,) be independent 
random variables for all ¢; and t. This is clearly not the case. 

Burke’s theorem implies that the generalization of Eq. (12.139) holds for the tan- 
dem combination of any number of M/M/1, M/M/c, or M/M/co queues. Indeed, the re- 
sult holds for any “feedforward” network of queues in which a customer cannot visit 
any queue more than once. 


Example 12.21 


Find the joint state pmf for the network of queues shown in Fig. 12.23, where queue 1 is driven 
by a Poisson process of rate M4, where the departures from queue 1 are randomly routed to 
queues 2 and 3, and where queue 3 also has an additional independent Poisson arrival stream of 
rate A>. 


2 My 
u— 111] 1 
By < 
—I —C }-— 
M3 


FIGURE 12.23 
A feedforward network of queues. 
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From Burke’s theorem N,(t) and N,(t) are independent, as are N,(t) and N;(t). Since the 
random split of a Poisson process yields independent Poisson processes, we have that the inputs 
to queues 2 and 3 are independent. The input to queue 2 is Poisson with rate A,/2. The input to 
queue 3 is Poisson of rate A,/2 + A, since the merge of two independent Poisson processes is 
also Poisson. Thus 


P[Ni(t) = k, M(t) = m, N3(t) = n] 


Il 


(1 = mÅ — pr) p3"(1 — ps)p$ kk, m,n = 0, 


where pı = Ay/My1, P2 = à1/2m, and p3 = (A,/2 + à2)/u3, and where we have assumed that all 
of the queues are stable. 


Proof of Burke’s Theorem Using Time Reversibility 


Consider the sample path of an M/M/1, M/M/c, or M/M/co system as shown in 
Fig. 12.24(a). Note that the arrivals in the forward process correspond to the departures 
in the time-reversed process. In Section 11.5, we showed that birth-and-death Markov 
chains in steady state are time-reversible processes; that is, the sample functions of 
the process played backward in time have the same statistics as the forward process. 
Since M/M/1, M/M/c, and M/M/co systems are birth-and-death Markov chains, we 


N(t) 4 


Forward time > 
=< Reverse time 


| | re 
a b 


(a) 


Departure times prior to t 
in forward process 


tiot t 


Arrival times after t 


~ 


in reverse process 


(b) 


FIGURE 12.24 

(a) Time instant a is an arrival time in the forward process and a departure time in the 
reverse process. Time instant b is a departure in the forward process and an arrival in the 
reverse process. (b) The departure times prior to time t in the forward process correspond 
exactly to the arrival times after time t in the reverse process. 
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have that their states are reversible processes. Thus the sample functions of these sys- 
tems played backward in time correspond to the sample functions of queueing systems 
of the same type. It then follows that the arrival process of the time-reversed system is 
a Poisson process. 

To prove part 1 of Burke’s theorem, we note that the interdeparture times of the 
forward-time system are the interarrival times of the time-reversed system. Since the 
arrival process of the time-reversed system is Poisson, it then follows that the depar- 
ture process of the forward system is also Poisson. Thus we have shown that the depar- 
ture process of an M/M/1, M/M/c, or M/M/co system is Poisson. 

To prove part 2 of Burke’s theorem, fix a time t as shown in Fig. 12.24(b). The de- 
partures before time t from the forward system are the arrivals after time ¢ in the re- 
verse system. In the reverse system, the arrivals are Poisson and thus the arrival times 
after time ¢ do not depend on M(t). These arrival instants of the reverse process are ex- 
actly the departure instants before t in the forward process. It then follows that N(t) 
and the departure instants prior to ¢ are independent, so part 2 is proved. 


NETWORKS OF QUEUES: JACKSON'S THEOREM 


In many queueing networks, a customer is allowed to visit a particular queue more 
than once. Burke’s theorem does not hold for such systems. In this section we discuss 
Jackson’s theorem, which extends the product-form solution for the steady state pmf to 
a broader class of queueing networks. 

If a customer is allowed to visit a queue more than once, then the arrival process 
at that queue will not be Poisson. For example, consider the simple M/M/1 queue with 
feedback shown in Fig. 12.25, where external customers arrive according to a Poisson 
process of rate A and where departures are instantaneously fed back into the system 
with probability .9. If the arrival rate is much less than the departure rate, then we have 
that the net arrival process (i.e., external and feedback arrivals) typically consists of 
isolated external arrivals followed by a burst of feedback arrivals. Thus the arrival 
process does not have independent increments and so it is not Poisson. 


Open Networks of Queues 


Consider a network of K queues in which customers arrive from outside the network 
to queue k according to independent Poisson processes of rate œp. We assume that the 
service time of a customer in queue k is exponentially distributed with rate ux and in- 
dependent of all other service times and arrival processes. We also suppose that queue 


= 
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FIGURE 12.25 
A queue with feedback. 
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k has c, servers. After completion of service in queue k, a customer proceeds to queue 
i with probability P,; and exits the network with probability 


K 
1- > Pui- 


The total arrival rate A; into queue k is the sum of the external arrival rate and 
the internal arrival rates: 


K 
Ap = ak + SAP k = 1,..., K. (12.140) 
j=l 
It can be shown that Eq. (12.140) has a unique solution if no customer remains in the 
network indefinitely. We call such networks open queueing networks. 

The vector of the number of customers in all the queues, 


N(¢) = (M(t), N,(t), leig Nx(t)), 


is a Markov process. Jackson’s theorem gives the steady state pmf for N(¢). 


Jackson's Theorem 


If Ax < cypux, then for any possible state n = (n1, m,..., ng), 


where P[ N; = nx] is the steady state pmf of an M/M/c; system with arrival rate A, and service 
rate uk. 


Jackson’s theorem states that the numbers of customers in the queues at time f 
are independent random variables. In addition, it states that the steady state probabili- 
ties of the individual queues are those of an M/M/c, system. This is an amazing result 
because in general the input process to a queue is not Poisson, as was demonstrated in 
the simple queue with feedback discussed in the beginning of this section. 


Example 12.22 


Messages arrive at a concentrator according to a Poisson process of rate a. The time required to 
transmit a message and receive an acknowledgment is exponentially distributed with mean 1/p. 
Suppose that a message needs to be retransmitted with probability p. Find the steady state pmf 
for the number of messages in the concentrator. 

The overall system can be represented by the simple queue with feedback shown in 
Fig. 12.25. The net arrival rate into the queue is A = a + Ap, that is, 


E Q 
1-p 
Thus, the pmf for the number of messages in the concentrator is 


P(N =n] = (1 — p)p” n=0,1,..., 


where p = A/p = a/(1 — p)p. 
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FIGURE 12.26 
An open queueing network model for a computer system. 


Example 12.23 


New programs arrive at a CPU according to a Poisson process of rate œ as shown in Fig. 12.26. A 
program spends an exponentially distributed execution time of mean 1/u in the CPU. At the 
end of this service time, the program execution is complete with probability p or it requires re- 
trieving additional information from secondary storage with probability 1 — p. Suppose that the 
retrieval of information from secondary storage requires an exponentially distributed amount of 
time with mean 1/2. Find the mean time that each program spends in the system. 

The net arrival rates into the two queues are 


Ay=atarA, and à = (1 — p)ay. 
Thus 
(1 — p)a 
A=— and A=———. 
P P 


Each queue behaves like an M/M/1 system, so 


P1 
E[N] = l- p 


_ P2 
1 ps’ 


and E[N] 


where p; = À;/u and p) = àz/m. Little’s formula then gives the mean for the total time spent 
in the system: 


a a 


Proof of Jackson's Theorem 


Jackson’s theorem can be proved by writing the global balance equations for the queue- 
ing network and verifying that the solution is given by Eq. (12.141). We present an al- 
ternative proof of the theorem using a result from time-reversed Markov chains. For 
notational simplicity we consider only the case of a network of single-server queues. 

Let n and n’ be two possible states of the network, and let v,,, denote the transi- 
tion rate from n to n’. In Section 11.5, we found that if we can guess a state pmf P[n] 
and a set of transition rates dy, for the reverse process such that (Eq. 11.65) 


P[n] vnn = P[n Joy n (12.142a) 
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and such that the total rate out of state n is the same in the forward and reverse 
processes (Eq. 11.64 summed over j) 


>= Daas (12.142b) 
m m 


then P[n] is the steady state pmf of the process. 
For the case under consideration our guess for the pmf is 


K 
Pin] = J[0 = ppp. (12.143) 
j= 
so the proof reduces to finding a consistent set of transition rates for the reverse 
process that satisfy Eqs. (12.142a) and (12.142b). Noting that Vaw is known and that 
P{n] and P[n’] are specified by Eq. (12.143), Eq. (12.142a) can be solved for the transi- 
tion rates of the reverse process: 


Pinup n’ 
nna =. 12.144 
Un m P{n’ ( ) 
Let n=(m,...,m,) denote a state for the network, and let 


ex = (0,...,0,1,0,...,0), where the 1 is located in the kth component. Only three 
types of transitions in the state of the queueing network have nonzero probabilities. In 
the first type of transition, an external arrival to queue k takes the state from n to 
n + ez. In the second type of transition, a departure from queue k exits the network 
and takes the state from n to n — eg, where n, > 0. In the third type of transition, a 
customer leaves queue k and proceeds to queue j, thus taking the state from n to 
n — e + ej, where ng > 0. Table 12.1 shows three types of transitions and their cor- 
responding rates for the forward process. 

A consistent set of transition rates for the reverse process is obtained by solving 
Eq. (12.144) for the three types of transitions possible. For example, if we let 
n’ =n + ez, then the transition n —n + e; in the forward process corresponds to the 
transition n + e, —n in the reverse process. Equation (12.144) gives 


K 
«ITC -= pj) Pi 
z j= 
Vain = K 
pe TT 7 pj) Pj! 
j= 
Qk Qk Akh 


The other reverse process transition rates are found in similar manner. Table 12.1 
shows the results for the transition rates of the reverse process that are implied by 
Eq. (12.144). 

The proof that the pmf in Eq. (12.143) gives the steady state pmf of the network 
of queues is completed by showing that the total transition rate out of any state n is 
the same in the forward and in the reverse process, that is, Eq. (12.142b) holds. In the 
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TABLE 12.1 Allowable transitions in Jackson network and their 
corresponding rates in the forward and reverse processes 


Forward Process 


Transition Rate 
n—nt+ ek Ak allk 
K 
n>n- e by 1 — X Py all k:n; > 0 
j=l 
n>n- e, + @; bP all k:n, > 0, all j 
Reverse Process 
Transition Rate 
n>n + e a(t = =P) all k 
7 
QkMk 
n>n- ez all k: ng > 0 
Àk 
à;P; 
n>n- er + @; ats all k:n, > 0, all j 
Ak 


forward process, the total transition rate out of state n is obtained by adding the en- 
tries for the forward process in Table 12.1: 


> Yam = Zar + Ò me (12.145a) 
m k: n;>0 
For the reverse process, we have from Table 12.1 that 
ÀP. 
A Ook jt jkPK 
Dam = Sai- Sry) +> {se P2 \ (12.145b) 
m k k: ng>0 j k 


We need to show that the right-hand sides of Eqs. (12.145a) and (12.145b) are equal. 
First, note that Eq. (12.140) implies that 


Tek > AjPik- 


The right-hand side of Eq. (12.145b) then becomes 


(Sa = SEM) + +> {at + 8 n DNP Pa) 


k: n>0 


DNE DA-a)+ > we +5 Oe a) 
j k: n>0 k k 
= Zar + > ue 


k: ng>0 


Thus the right-hand sides of Eqs. (12.145a) and (12.145b) are equal and thus Eq. (12.143) 
is the steady state pmf of the network of queues. This completes the proof of Jackson’s 
theorem for a network of single-server queues. 
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12.9.3 Closed Networks of Queues 


In some problems, a fixed number of customers, say J, circulate endlessly in a closed 
network of queues. For example, some computer system models assume that at any 
time a fixed number of processes use the CPU and input/output (I/O) resources of a 
computer as shown in Fig. 12.27. We now consider queueing networks that are identical 
to the previously discussed open networks except that the external arrival rates are 
zero and the networks always contain a fixed number of customers J. We show that the 
steady state pmf for such systems is product form but that the states of the queues are 
no longer independent. 
The net arrival rate into queue k is now given by 


K 
i= See hag k (12.146) 
j=l 


Note that these equations have the same form as the set of equations that define the 
stationary pmf for a discrete-time Markov chain with transition probabilities P;,. The 
only difference is that the sum of the A,’s need not be one. Thus the solution vector to 
Eq. (12.146) must be proportional to the stationary pmf {7} corresponding to { Pix}: 


Ag = ACL) Tx, (12.147) 
where 
K 
me = X r; Px (12.148) 


j=l 


and where A(T) is a constant that depends on J, the number of customers in the queue- 
ing network. If we sum both sides of Eq. (12.147) over k, we see that A(T) is the sum of 
the arrival rates in all the queues in the network, and 7, = A,/A(Z) is the fraction of 
total arrivals to queue k. 


Theorem 


Let A, = A(I)r, be a solution to Eq. (12.146), and let n = (m,, m,..., ng) be any state of the 


network for which n,,...,nx = 0 and 
ntm+t = +n =T7, (12.149) 
Ap 
= TEC mO 
pee ae 1 =p. wee 
My M2 
FIGURE 12.27 


A closed queueing network model for a computer system. 
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then 


(12.150) 


where P[ N, = ny] is the steady state pmf of an M/M/c, system with arrival rate A; and service 
rate ug, and where S(J) is the normalization constant given by 


S(I) = dS PLN, = m)PLNg = my]... PIN« = ng]. (12.151) 


n:n + ss +ng=I 


Equation (12.150) states that P[N(t) =n] has a product form. However, 
P[N(t) = n] is no longer equal to the product of the marginal pmf’s because of the nor- 
malization constant S(/). This constant arises because the fact that there are always J cus- 
tomers in the network implies that the allowable states n must satisfy Eq. (12.149). The 
theorem can be proved by taking the approach used to prove Jackson’s theorem above. 


Example 12.24 


Suppose that the computer system in Example 12.23 is operated so that there are always J pro- 
grams in the system. The resulting network of queues is shown in Fig. 12.27. Note that the feed- 
back loop around the CPU signifies the completion of one job and its instantaneous replacement 
by another one. Find the steady state pmf of the system. Find the rate at which programs are 
completed. 

The stationary probabilities associated with Eq. (9.146) are found by solving 


7, = pt, + m, m = (1 — p)m, and 7+ 7 = 1. 


The stationary probabilities are then 


1 1-p 
Tı = 2-p and 7. = FED, (12.152) 
and the arrival rates are 
ACI 1 — p)A(I 
2—p 2—p 


The stationary pmf for the network is then 


i 


(1 — py)pi(1 p) 
S) 


PIN, = i, N, = I — i] 0<i<I, (12.154) 


where pı = Aj/m, and py = M/m, and where we have used the fact that if MN, =i then 
N, = I — i. The normalization constant is then 


I 
S(I) = (1 - p)(1 - P2) > Pips" 


i — (p1/p2)"* 


P? TL (p/p) ` 


= (1 — p)(1 — p) (12.155) 
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Substitution of Eq. (12.155) into Eq. (12.154) gives 


i= l 
PIN, = i, Nć, = I — il e 0<i<I, (12.156) 


where 


Pi _ Tı _ H2 
P2 Tk (1 = p) 


B= (12.157) 
Note that the form of Eq. (12.156) suggests that queue 1 behaves like an M/M/1/K queue. The 
apparent load to this queue is 6, which is proportional to the ratio of I/O to CPU service rates 
and inversely proportional to the probability of having to go to I/O. 

The rate at which programs are completed is pA,. We find A, from the relation between 
server utilization and probability of an empty system: 


1- 
1- gHr 


1 — Ay/py = PLN, = 0] = 


which implies that 


Example 12.25 


A transmitter (queue 1 in Fig. 12.28) has two permits for message transmission. As long as the 
transmitter has a permit (N; > 0), it generates messages with exponential interarrival times of 
rate A. The messages enter the transmission system and require an exponential service time at 
station 2. As soon as a message arrives at the other side of the transmission system, the corre- 
sponding permit is sent back via station 3. Thus the transmitter can have at most two messages 
outstanding in the network at any given time. Find the steady state pmf for the network of 
queues. Find the rate at which messages enter the transmission system. 


-O 1—C) 
Ha 


H3 


Transmitter Transmission system Receiver 


FIGURE 12.28 
A closed queueing network model for a message transmission system. 
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We can view the two permits as two customers circulating the queueing network. Since 
Piz = Pas = Py; = 1, we have that 7; = m, = m3 = 1/3 and thus 
A(2) 
À1 = nv) = À3 = A 


The steady state pmf for the network is 


CE = pP)%( — p3) 0) 
S(2) 


PIN, =i,N, = j,N3=2-i-j] = 
for0 =i52,05j52-i, 


where pı = A(2)/3A and p2 = p3 = A(2)/3u. The normalization constant S(2) is obtained by 
summing the above joint pmf over all possible states and equating the result to one. There are six 
possible network states: (2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), (0, 1, 1). Thus the normaliza- 
tion constant is given by 


S(2) = (1 — pi)(1 — po)(1 — ps){pt + P + p3 + pip + pips + p2ps} 
= (1 — pi)(1 — m) {Pi + 2p3 + 2pip2 + p3}, 


where we have used the fact that p2 = p3. 
The rate at which messages enter the system is 


A, = A(1 — P[M = 0}), 
where 
PN, = 0] = P[N = (0,2,0)] + P[N = (0,0,2)] + P[N = (0,1,1)] 


33 Z 3/p? 
pt + 2pip2 +3% 1/A + 2/Au + 3/p?” 


12.9.4 Mean Value Analysis 


Example 12.25 shows that the evaluation of the normalization constant is the funda- 
mental difficulty with closed queueing networks. Fortunately, a method has been de- 
veloped for obtaining certain average quantities of interest without having to 
evaluate this constant. This mean value analysis method is based on the following 
theorem. 


Arrival Theorem 


In a closed queueing network with J customers, the system as seen by a customer arrival to queue 
jis the steady state pmf of the same network with one fewer customer. 


We have already encountered this result in the discussion of finite-source queue- 
ing systems in Section 12.5. We prove the result in the last part of this section. We now 
use the result to develop the mean value analysis method. 
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Let E[N;(I)] be the mean number of customers in the jth queue for a network 
that has J customers, let E[T;() ] denote the mean time spent by a customer in queue j, 
and let A;(J) denote the average customer arrival rate at queue j. The mean time spent 
by a customer in queue j is his service time plus the service times of the customers he 
finds in the queue upon arrival: 


E[T;(1)] = E[7;] + E[7;] xX mean number found upon arrival 
= E[r,] + E[rJE[N( — 1)] 
1+ E[N - 1)] 


= ! (12.158) 
Bj 


where E| Nj(J — 1)] is the mean number found upon arrival by the arrival theorem. By 
Little’s formula, the mean number of customers in queue j when there are / in the net- 
work is 


(1)] = (Dn jE(T(D). (12.159) 


Since the sum of the customers in all queues is Z in the previous equation, we have that 


K K 
I= ZENU] = AU) X rE). (12.160) 
Thus ' ' 
A(I) = E L (12.161) 
YET) 
= 


The mean value analysis method combines Eqs. (12.158) through (12.161) in the 
following way. First compute 77; by solving Eq. (12.148), then for J = 0: 


E[N(0)]=0  forj =1,...,K. 


For J = 1,2,...: 
1 BUNCE Sy]. 
E[T,(1)] = 7 + ie j=1,...,K (12.158) 
D mE] 
E[N;(I)] = ACT ;ELT;(I)] JE lK: (12.159) 


Thus the mean value algorithm begins with an empty system and by use of the above 
three equations builds up to a network with the desired number of customers. This 
method has considerably simplified the numerical solution of closed queueing net- 
works and extended the range of network sizes that can be analyzed. 
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Example 12.26 


In Example 12.24, let wy = mu = 1, and p = 1/2. Find the rate at which programs are com- 
pleted if J = 2. 

It was already indicated in Example 12.24 that the rate of program completion is 
pA,(2) = p7,A(2). From Eq. (12.152), we have that m = 1/(2 — p) = 2/3. Thus we only need 
to find A(2), the total arrival rate of the network with J = 2. 

Starting the mean value method with J = 1, we have 


BIK())= 2-1 EAD] 
1 
A(1) mE) + mil) 1 
E[N,(1)] = A(1)m£[7;(1)] = z 
E[.N,(1)] = A(1)mE[T:(1)] = > 
Continuing with J = 2, we have 
E[N,(1 
EnO) = $ NOs 
E[N,(1 
BRQ) = ++) 
A(2 2 9 
( ) E TE T,(2)] + mE T(2)] 7 


Thus the program completion rate is 
3 


pmA(2) = 7 


You should verify that this is consistent with the results of Example 12.24. 


Example 12.27 


In Example 12.25, let 1/A = a and u = 1. Find the rate at which messages enter the system 
when J = 2. 
We previously found that 7, = m, = 73 = 1/3 and 


(T) = Ao(Z) = As(Z) 


Starting the mean value method with J = 1, we have 


ET())=4@  E(M(1)] = E[T;(1)] = 1 


A(1) = 1 = 3 
(= Tan @] + mE] + mE) a +2 
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E(N(1)] = Am EITC) = 5 
E[N,(1)] = A(1)mETT(1)] = = = ELNA(1)] 
Continuing with J = 2, we have 
a a 
E[T(2)] = a + aB[N(1)] = # 
E[Ty(2)] = 1 + LELNA()] = SS = EIR) 
1x02) = a 
(1/3){ (2a? + 2a)/(a + 2) + [2(a + 3)/(a + 2)]} 

P 3(a + 2) 
 @+2a+3 


Finally, messages enter the transmission network at a rate A;(2) = A(2)/3, so 


a+2 


oe a+2at+3 


You should verify that this is consistent with the results obtained in Example 12.25. 


*12.9.5 Proof of the Arrival Theorem 


Consider the instant when a customer leaves queue j and is proceeding to queue k. We 
are interested in the pmf of the system state at these arrival instants. Suppose that at 
this instant, with the customer removed from the system, the customer sees the net- 


work in state n = (n4,..., ng). This occurs only when the network state goes from the 
state n’ = (7,...,n; + 1,..., ng) to the state n” = (m,,...,m;,...,mg + 1,..., x). 
Thus: 


P{customer sees n | customer goes from j to k] 


P[customer sees n, customer goes from j to k] 


P{customer goes from j to k] 


P{customer goes from j to k | state isn’]P[N(/) = n’] 


P| customer goes from j to k] 


(12.162) 


To simplify the notation, let us assume that we are dealing with a network of M/M/1 
queues, then 


770 Chapter 12 Introduction to Queueing Theory 


K Ph 


where S'(I) absorbs all the constants associated with the P[ Nn = nm]: 


K 
s)= X _ [e 


n:n + +ng=I m= 


Next, consider the probability that queue j is not empty: 


(12.163) 


(12.164) 


PINI) > 0] = >  P[M 5 m]. PIN; = nj + 1]... P{[Nk = n] 


n:n + +ng=I -1 


Pj œ 
ninyt+ng=I-1 7 S (I) 
Pj Eo 
=_ pie 
S (I) ee EE ý 


(12.165) 


where we have noted that the above summation is the normalization constant for a 


network with J — 1 customers S'(I — 1). 
Finally, we substitute Eqs. (12.165) and (12.163) into Eq. (12.162): 


P[ customer sees n | customer goes from j to k] 
K 
pills (1) 
m=1 


-laos -1S C) 
K 


Pin 
m=iS'U z 1) 
= P[N(I — 1) =n], 


which is the steady state probability for n in a network with J — 1 customers. This com- 


pletes the proof of the arrival theorem. 
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SIMULATION AND DATA ANALYSIS OF QUEUEING SYSTEMS 


In this section we present a basic introduction to the simulation of queueing systems. 
Analytical methods are valuable due to the ease with which they allow us to explore 
the issues and tradeoffs in a given model. Numerical techniques can supplement ana- 
lytical methods and provide additional detailed information, especially when transient 
and dynamic behavior is of interest. However, in many situations analytical and nu- 
merical methods are not sufficient and simulation provides us with a flexible means to 
investigate the behavior of complex systems. In this section we introduce the basic ap- 
proaches available for simulating queueing systems. Throughout our discussion we em- 
phasize the need for careful design of the simulation experiment as well as the need for 
careful application of statistical methods on the observations to draw valid conclusions. 


Approaches to Simulation 


The dynamics of a queueing system are represented by one or more random processes, 
so the usual considerations in simulating random processes apply. A very basic option 
is whether a single realization or multiple realizations of the random process are used. 

Multiple realizations that are statistically independent allow us to use the stan- 
dard statistical methods introduced in Chapter 8 to analyze iid random variables, for 
example, to obtain confidence intervals and fit distributions. A single realization of a 
random process allows us a more restricted set of statistical tools and frequently leads 
to methods that attempt to provide a set of observations that are iid so that we can use 
standard tools. In some real experimental situations we may only have one realization 
of the process to work with and so we may have no choice. However in computer sim- 
ulation with proper design, we can usually conduct multiple replications of an experi- 
ment to produce independent observations.’ In general, we recommend a pragmatic 
approach that uses some replication when possible. 

A simulation study based on a single realization usually involves assumptions 
about stationarity and ergodicity so that the behavior of the process over time reveals 
its ensemble averages and probabilities. Examples of such processes are processes with 
stationary independent increments and processes that involve ergodic Markov chains. 
Both of these classes of processes involve initial transient behavior and so we must de- 
cide whether to keep or discard the observations obtained during the initial portion of 
the simulation. If we decide to discard, then we need to somehow identify when the 
transient phase is over and the process has reached steady state. This is not an easy task, 
as discussed extensively in [Pawlikowski], and there are a variety of criteria that can be 
applied for declaring that a system has reached steady state. We note that the use of 
replicated simulations can help characterize the transient phase of a process. (See 
Problem 12.67.) 


4Care should be taken to ensure that the seed in the random number generator is different in each replication. 
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12.10.2 


The design of a simulation must take into account the behavior and parameters 
that we are interested in measuring and observing. Seemingly easy questions such as 
determining state probabilities are not so straightforward. We could be interested in 
the long-term proportion of time the system spends in state, or the states seen by arriv- 
ing customers, or even the state left behind by a departing customer. We have seen that 
these quantities need not be the same. The design of the simulation can ease or make 
difficult the measurement of a particular parameter. 

In the remainder of the section we are interested in the parameters of the system 
when it is in steady state, usually either the mean number of customers in the system or 
the long-term proportion of time the system has a certain number of customers. We 
cover the following approaches to simulating a queueing system. 


e Simulation through independent replication; 

e Time-sampled process: {N(k6)}; 

e Embedded Markov chain and state occupancies: {N (tk), Tk}; 
e Replication through regenerative cycles. 


Simulation through Independent Replications 


Simulation through independent replications involves simulating a process R times to 
obtain a set of R independent observations {X (t, £,), X(t, £.),..., X(t, fr) }. We use a 
function of the observations to estimate a parameter 0 of the random process: 


A 


O(Xp) = g( X(t, £1), X(t, 62), sai X(t, £R))- 


For example, to estimate the mean of the process at time t we use: 


ae jigs 
X(t) = > DH X(t, 4). (12.166) 
R r=1 
To estimate the variance of the process at time t we use: 
A ty at ae 
GR(t) = Read 2, (XU, E) = XR) (12.167) 


By design the observations are independent random variables. In order to proceed, we 
also need to assume that the observations are Gaussian random variables. The usual 
approach of taking the sum of a sufficiently large number of variables and using the 
central limit theorem applies. We can also use a statistical test to check that the samples 
are close to Gaussian distributed. Once we have Gaussian observations, we can pro- 
vide the confidence intervals from Eq. (8.58): 


(XR — tap,n-10R/ VN, XR + taj n-1fR/ VN). (12.168) 


Equation (12.168) is used widely to provide approximate confidence intervals. 

We note that the sample mean and variance estimators in Eqs. (12.166) and 
(12.167) and the associated confidence intervals allow us to identify time dependencies 
in the behavior of the random process. In particular, in the next example, we use them 
to identify the transient phase of a random process that has a steady state. 


12.10.3 
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When the random process is a continuous function of time, the estimator can take 
the form of an integral. For example, for a Markov chain process we can estimate either 
the time average of the process or the proportion of time in state j in the rth replication 
by an integral over an interval of time T: 


T 
N, = a N(t, ,) dt and = pad ly N(t, @,)) dt. (12.169) 
0 


{ N,} and { pW} provide the independent random variables that can be used to obtain 
a confidence interval for the time average of N(t) and the proportion of time that 


N(t) = j. 


Time-Sampled Process Simulation 


A simple approach to simulating continuous-time queueing systems is to use time-sampled 
process simulation. The time axis is divided into small intervals of length 6 and a discrete- 
time process is simulated. The following example demonstrates the approach. 


Example 12.28 Transient of M/M/1 Queue Using Sampled-Time Approximation 


Investigate the transient behavior of N(f), the number of customers in an M/M/1 queueing sys- 
tem, using a sampled-time approach. Assume the system is initially empty. Generate 2000 steps 
of 6 = 0.1 seconds with u = 1 job/second and run two cases: A = 0.5 and A = 0.9 jobs/second. 
Replicate the simulation 20 times and plot the sample mean of the process across the 20 replica- 
tions (Eq. 12.166). Find the covariance function for each realization and plot the average of the 
covariance functions across the 20 replications. 

The sampled-time approximation involves simulating a system in small steps of 6 seconds. 
For a birth-death process (such as the M/M/1 queue) in state j > 0, three outcomes can occur in 
6 seconds: (1.) no arrival and no departure occur with probability 1 — (A; + mj); (2.) one ar- 
rival occurs with probability ‚ô; (3.) one departure occurs with probability 4,5. We can adjust 
for the j = 0 state by letting wy = 0, and the j = Nmax state by letting Aya, = 0. Note that the 
state-transition diagram of this sampled-time queueing system has the structure of the discrete- 
time Markov chain in Example 11.49. We use the code for that example to generate 20 realiza- 
tions of 2000 steps of N (kê), which corresponds to 200 seconds of time. 

Figure 12.29(a) shows the sample mean of 20 realizations of N(t). Note that this sample 
mean averages over 20 processes that can each exhibit a lot of variation, see Figs. 11.20 and 11.21. 
Consequently the averaged realizations still exhibit quite a bit of variation. The lower curve cor- 
responds to p = 0.5, which can be seen to reach and vary about the true mean of E[N] = 1 after 
about 100 steps (10 seconds). The higher curve corresponds to p = 0.9, which is a much higher 
utilization. The true mean in this case is E[N] = 9 and it can be seen that the average of the re- 
alizations does not reach the area of the mean until about 1400 steps. Thus we see that the tran- 
sient period increases dramatically as the utilization approaches 1. 

Figure 12.29(b) shows the sample mean of the normalized covariance functions of the 20 
realizations of M(t). For p = 0.5, the autocovariance does not reach 0 until about 200 steps. 
Furthermore, for p = 0.9, the autocovariance is approximately 0.6 after 200 steps. This much 
longer sustained correlation is another indicator of the increase in transient time as utilization 
is increased. 
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FIGURE 12.29 
(a) Transient of M/M/1 queue using sampled-time approach, p = 0.5, 0.9; (b) normalized covariance of M/M/1 queue, 
p = 0.5, 0.9. 


200 


In order to approximate the queueing process accurately, the time-sampled ap- 
proach requires that we use a small step size. In addition to possibly increasing the 
amount of computation required to perform the simulation size, a small step size has 
the effect of making more adjacent samples highly correlated. This is clearly evident in 


the observed autocovariance function in the above example. 


The correlation of samples poses a problem in estimating parameters of a queue- 
ing process from a single realization. Suppose we are interested in estimating the mean 


of {N(k6)} from a single realization of the process: 


N, = 


P 


SN (KS) l (12.170) 
k=1 


The terms in the series {N (kô) } are correlated, so from Eq. (9.108), assuming that 
the process is wide sense stationary, the variance of the sample mean is then larger than 


it would be for iid samples: 


VAR[N,] = Texto + 25(1 = Sexe | (12.171) 
n k=1 n 


where Cy(k) is the covariance function of N(t). Only Cy(0), which corresponds to 
the variance of N, would be present if the observations were uncorrelated. 
Example 12.28 demonstrated how M(t) in queueing systems can maintain signifi- 
cant correlation for significant periods of time. The example also illustrated how 
the process N(t) becomes more correlated as the utilization increases. As discussed 
in Examples 9.49 and 9.50, the net effect is that the convergence of the sample 
mean to E[N] is slower than if the samples were independent. This larger variance 
can be taken into account by gathering estimates for the covariance terms Cy(k) 
and using Eq. (12.168) in the calculation of confidence intervals. (See [Law, p. 556] 


for a discussion on such confidence intervals). 
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The relative frequencies of the states provide estimates for the long-range pro- 
portion of time spent in each state: 


n 
Dj = 2 S (kô) (12.172) 
n K=1 
where J; is the indicator function for the event {N(kô)} = j}. Relative frequencies 
are a special case of sample means so the same cautions regarding the variance of the 
estimates and convergence rates apply. 

The method of batch means, introduced in Section 8.4, provides an approach to 
dealing with the correlation among samples. A long simulation run is divided into mul- 
tiple segments that are sufficiently long that the samples from different segments have 
low correlation. The parameter estimates from different segments, e.g.,sample mean or 
relative frequencies, are treated as independent random variables and the standard 
statistical tools are applied to the batch means and batch relative frequencies. 


Example 12.29 Confidence Intervals Using Batch Means 


Use the method of batch means to estimate the mean of the M/M/1 queue when A = 0.5 and 
u = 1 job per second. Each realization should consist of 8 batches of 600 steps. Replicate each 
simulation five times. 

Five replications of 5000-step realizations were carried out. The first 200 samples from 
each realization were discarded to remove bias from the initial transient. The remaining 4800 
samples in each realization were divided into 8 batches. Table 12.2(a) shows the means for each 
of the resulting 40 batches. For each realization the sample mean and sample standard deviation 
for the 8 batch means were calculated and are shown in Table 12.2(b). Confidence intervals were 
then calculated for each realization. For a 90% confidence level (a = 10%), ty. = 1.8946 and 
ô = tano/ V8. The upper and lower limits of the confidence interval for the mean of the process 
are given in the two rightmost columns of Table 12.2(b). Every confidence interval contains the 
value 1, which is the expected value of the M/M/1 queue when p = 1/2. 


TABLE 12.2a Sequence of batch means for five replications 


r/b 1 2 3 4 5 6 7 8 

0.84500 0.70667 0.51500 4.57167 0.30500 3.56000 1.75167 0.91167 
0.83000 0.66000 0.97667 1.21833 1.14667 1.16333 2.39833 0.61000 
0.96000 0.55333 0.89833 0.62500 0.31000 3.39167 0.86167 0.43333 
2.73333 1.06167 0.62167 0.45667 2.17333 1.30000 0.57667 0.88167 
1.14000 0.85667 0.82500 1.07167 0.67833 1.02167 1.08833 1.44667 


NWN FE 


TABLE 12.2b Confidence interval for mean for each of five replications 


r/b Mean o 6 Lower Upper 
1 1.6458 1.57547 1.05532 0.59052 2.7011 
2 1.1254 0.56347 0.37744 0.74798 1.5029 
3 1.0042 0.99199 0.66448 0.33969 1.6686 
4 1.2256 0.81934 0.54883 0.67679 1.7745 
5 1.0160 0.23455 0.15711 0.85893 1.1732 
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TABLE 12.2c Sequence of batch confidence intervals across five replications 


r/b 1 2 3 4 5 6 7 8 
Mean 1.3017 0.7677 0.7673 1.5887 0.9227 2.0873 1.3353 0.8567 
o 0.8099 0.1972 0.1931 1.6965 0.7796 1.2727 0.7356 0.3846 
ô 0.7721 0.1880 0.1841 1.6174 0.7432 1.2134 0.7013 0.3667 
Upper 2.0738 0.9557 0.9515 3.2061 1.6659 3.3007 2.0366 1.2234 


Lower 0.5296 0.5796 0.5832 —0.0287 0.1795 0.8739 0.6340 0.4900 


Table 12.2(c) gives the 90% confidence interval that is calculated for the batch means 
across different replications. These batches are truly independent and will not be affected by cor- 
relation effects. It is important to determine whether any evidence of bias exists in the earlier 
batches due to the initial transient phase. It can be seen that the second and third columns do not 
include the value 1 by a small margin. 

We also calculated a 90% confidence interval for the combined 40 batches and obtained 
(1.2034 — 0.24575, 1.2034 + 0.24575) = (0.95765, 1.449). Finally, we calculated a 90% confi- 
dence interval based on the sample means of the 5 realizations, and obtained 
(1.2034 — 0.25096, 1.2034 + 0.25096) = (0.95244, 1.4544). Note that the latter 5 realizations 
are truly independent and constitute a pure application (no batching) of the replication method. 


12.10.4 Simulation Using Embedded Markov Chains 


Many queueing systems have natural embedding points that lead to discrete-time 
Markov chains. We saw in Chapter 11 that queueing systems that are modeled by 
continuous-time Markov chains can be defined in terms of an embedded Markov 
chain and exponentially distributed state occupancy times. In this chapter we saw 
that the distribution of the steady state number of customers in an M/G/1 system can 
also be observed through an embedded Markov chain. In this section we discuss 
simulation based on embedded Markov chains. 

First, let N(t) be the number of customers in a queueing system that is modeled 
by a continuous-time Markov chain. The transition rate matrix I for the process pro- 
vides us with the transition probabilities of the embedded chain as well as the state oc- 
cupancy times (see Eq. 11.35). In Example 11.50 we used this approach to generate 
realizations of an M/M/1 queue. The output of this simulation is a sequence of states 
{N;} and the corresponding state occupancy times {7;}. The relative frequencies ob- 
tained from the sequence of states provide us with an estimate for the state probabili- 
ties {a} of the embedded Markov chain. The occupancy times according to their 
corresponding state, e.g., {7 (j), k = 1,..., nj} for state j, can also provide us with an 
estimate for the state occupancy times. We can obtain an estimate for the mean of N(t) 
directly: 


T 

moth 1% 

N= 7 [xo dt == SN,T,. (12.173) 
/ : 
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An estimate for long-term proportion of time in state j is obtained similarly: 


ii nj; 
(a 
Bj = rl I(t) dt = Ti): (12.174) 


If the Markov chains that model the system are ergodic, then the above estimates will 
converge to the correct steady state values. 


Example 12.30 M/M/1 Steady State Probabilities Using Embedded Markov Chain 


Use the embedded Markov chain approach to estimate the state probabilities in an M/M/1 sys- 
tem with A = 0.75 and u = 1. Calculate the proportion of time spent in each state and obtain 
confidence intervals for these values by using replication. 

The code in Example 11.50 can be modified to calculate Eq. (12.174) by accumulating the 
total time spent in each state as the simulator generates each new state and occupancy time. 
Each realization was 1800 seconds in duration, but no data was gathered during the first 300 
seconds of the simulation. Eight pmf estimates were obtained and the sample mean and stan- 
dard deviation as well as a 90% confidence interval for each state probability were computed 
using the eight independent estimates from the replication. The results are shown in Fig. 12.30. 
It can be seen that there is generally good agreement between the theoretical pmf and the con- 
fidence intervals. 
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FIGURE 12.30 
Confidence intervals for steady state M/M/1 pmf. 
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The following example shows that we can simulate an M/G/1 system using anoth- 
er type of embedded Markov chain. 


Example 12.31 Simulating M/G/1 Using Embedded Markov Chains 


Section 12.7 showed that the steady state distribution for the number of customers in an M/G/1 
system is the same as the distribution for the number left behind by a customer departure. Fur- 
thermore, the number of customers left behind by the jth customer departure, N;, forms a dis- 
crete-time Markov chain as follows: 


N; = (N; — 1)" + M; (12.175) 


where M, is the number of arrivals during the service time of the jth customer and where 


(x)* + max(0, x). 


Therefore we can obtain the steady state pmf for N(t) in an M/G/1 system by finding the transi- 
tion probability matrix associated with Eq. (12.175) and applying the methods developed in 
Section 11.6. We explore this approach further in the problems. 


Next we introduce Lindley’s recursion for the waiting time in a G/G/I system as a 
final application of embedded Markov chain methods. Assume that the customer in- 
terarrival times and service times are independent random variables with arbitrary dis- 
tributions. We focus on the waiting time experienced by an arriving customer and we 
show that the sequence of waiting times forms a Markov chain. 

Let a1, a2,... denote the customer interarrival times and let 71, 7,... be their 
corresponding service times. Let W,, be the waiting time of the nth customer. Suppose 
the (n + 1)st customer arrives to a nonempty system, as shown in Fig. 12.31(a). Note 
that we must have: 


Wa + Tn = Ant + Wrst 


in order for the arriving customer to find a nonempty system. It then follows that the 
waiting time for the (n + 1)st customer must be given by: 


Writ = Wa + tr — anp EW, + Tr — an1 = O. (12.176a) 


n-1 n n—-1 n 
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FIGURE 12.31 


Customer arrivals and departures in G/G/1 queue. 
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On the other hand, the arriving customer finds an empty system (Fig. 12.31b) under the 
following conditions: 


Wisi = 0 if W, + Tr - an1 < 0. (12.176b) 
Therefore we conclude that the sequence of waiting times is given by Lindley’s recursion: 


W,+1 = max(0, Wp + Ta — 4y41)- (12.177) 


W,,.1 depends on the past only through W, and 7,, and a,,,,. Since 7, and a„+1 are from 
iid sequences and are independent of each other, we conclude that W,,,, is a Markov 
process with stationary transition probabilities. Note that W,, assumes a continuum of 
values. We can generate the sequence of total delays experienced by the sequence of 
customers as follows: T, = W,, + a,. 

Equation (12.177) can be used to derive an integral equation for the steady state 
waiting time of customers in a G/G/1 system [Kleinrock, p. 282]. The equation is similar 
to the Wiener—Hopf equation we encountered in Section 10.4 and usually requires trans- 
form methods to solve. However, Eq. (12.177) is remarkably simple to use in simulations. 


Example 12.32 Estimating Waiting Time Distribution Using Lindley’s Recursion 


Estimate the distribution of the customer waiting times in an M/M/1 queue when A = 0.9 and 
u = 1 job per second. Compare the empirical cdf of the observed total time in the system with 
the theoretical distribution. 

Lindley’s recursion can be readily implemented in Octave. Arrays of exponential interarrival 
times with A = 0.9 and service times with u = 1 job per second are generated initially. Lindley’s re- 
cursion is then used to compute the sequence of waiting times and total delays for the sequence of 
customers. The Octave function empirical_cdf is used to obtain the cdf of the observations. In the 
simulation a sequence of 2000 waiting and total times were collected and no data was deleted to 
allow for an initial transient period. Figure 12.32 compares the empirical cdf with the distribution for 
waiting time in an M/M/1 system with p = 0.9. A test such as the Kolmogorov—Smirnov test can be 
applied to assess goodness of fit of the empirical distribution to the hypothetical distribution. 
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FIGURE 12.32 
Empircial cdf of M/M/1 queue using Lindley’s recursion, p = 0.9. 
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12.10.5 Replication through Regenerative Cycles 


In Section 7.5 we considered renewal processes where time is divided into intervals ac- 
cording to an iid sequence of positive random variables {_X;}. We associated with each 
interval X; a cost C;. We then proved the following result Eq. (7.47): 


1M®  E[C] 
li ae 
Dot > Ci = ENX] 


(12.178) 


where E[C] is the average cost per cycle and E[X] is the mean cycle length. 

The regenerative method for simulation involves finding renewal points in a 
queueing system where the process “restarts” itself so that its future is independent of 
the past. For example, in many queueing systems this renewal or regeneration occurs 
when a customer arrives to an empty system. Measurements taken during different cy- 
cles are then independent random variables. Thus in effect the regenerative method 
partitions a single simulation into a number of independent replications. 

The long-term time average of C(t) in Eq. (12.178) is given by the ratio of the 
sample mean for the measurements for C and the sample mean for X. For example, if 
we are interested in the probability that the system is in state j, then we let C; be the 
time the system is in state j during the jth cycle: 


C; = / I(t) dt = X TX) (12.179) 
Ri-1 
where n;(j) is the number of times state j occurred during the ith cycle and T‘,(/) is the 


occupancy time of the jth occurrence of state j during the ith cycle. The corresponding 
estimate for the proportion of time in state j is: 


~ > Teli) 
x n R=] 
rer (12.180) 
ni 
n iZi 
On the other hand if we are interested in the mean of N(t), we let 
R; 
C; = i N(t) dt = X NİT} (12.181) 
k= 
Ri-1 


where n; is the number of states visited during the ith cycle. The corresponding esti- 
mate for the mean is: 


(12.182) 
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The numerators and denominators in Eqs. (12.180) and (12.182) individually are 
strongly consistent estimators for their corresponding parameters. Therefore the esti- 
mators formed by taking their ratios in Eqs. (12.180) and (12.182) are also strongly 
consistent. Note, however, that the ratios provide biased estimates. We discuss confi- 
dence intervals after the following example. 


Example 12.33 Regenerative Method for M/M/1 Simulation 


Estimate the mean waiting time of customers in the system in Example 12.28 using the regener- 
ative method to analyze the sequence of waiting times produced by Lindley’s recursion. 

Let a cycle consist of the time from when a customer arrives to an empty system until 
the next time a customer arrives to an empty system. We are interested in the average waiting 
time experienced by customers over a long period of time. Suppose we measure the number of 
customers serviced in a sequence of cycles {N,(i)}, and the total of the waiting times of all 
customers in the cycle {W,,(i)}. Each of these sequences is iid and so each one will converge 
to its respective mean. The ratio of the two expressions provides an estimate for the mean 
waiting time (see Problem 12.78): 


Der, (12.183) 


It is easy to prepare a simulation to gather {N.(i)}, {Wage(i)}, and the sequence of cycle 
durations { X;} using Lindley’s recursion because each regeneration point is marked by arriving 
customers that have zero waiting time. The resulting sequences can be parsed according to their 
respective cycles and the above cycle statistics can then be gathered. 

A simulation with 4000 customer arrivals to an M/M/1 systems with A = 0.9 and u = 1 
was conducted and the results in Table 12.3 were obtained. The 4000 arrivals produced 366 cy- 
cles. The ratio of the mean number of customers serviced in a cycle to the mean cycle duration 
gives the following estimate for the arrival rate: 


Arrival Rate Estimate = 10.842/11.913 = 0.91, 
which is close to A = 0.9. The estimate for the mean waiting time obtained from the ratio in 


Eq. (12.183) was 8.80. From Eq. (9.27) the mean waiting time for this M/M/1 queue is 
E[W] = 9, which again is quite close. 


TABLE 12.3 Per regenerative cycle statistics for M/M/1 queue 


M/M/1 Mean Waiting Time 
L = 4000 TotCycle = 366 
MeanCycle = 11.913 STDCycle = 41.374 
MeanCount = 10.842 STDCount = 39.236 
MeanCycleWait = 95.424 STDCycleWait = 612.20 


MeanWait = 8.8017 


782 


Chapter 12 Introduction to Queueing Theory 


Of course the whole point of striving to get independent observations is to pro- 
duce confidence intervals. In [Law, p. 559] an approximate confidence interval is devel- 
oped for an estimator of the form in Eq. (12.183). The pair (Wage(i), Ne(i)) form an iid 
sequence but in general W,,,(i) and N.(i) are correlated. It can be shown that for large n 
the estimator in Eq. (12.183) is asymptotically Gaussian with mean E[W] and variance: 


A 


y(n) = Gy, (n) — 2w (n). n. + (Wn) P(n) (12.184) 


agg agg 


where TW soe n, İs the estimator for the covariance of W, 9, 
to the following confidence interval: 


(v Z1-apf yn A san) 


(i) and N,(i). This result leads 


,W + 


(12.185) 
Ñ. N, 


The required estimates for the variances and covariances of W,,.(i), N-(i) can be made 
from the per-cycle statistics. 

In practice the regenerative method is difficult to apply because the occurrence 
of regenerative instances is not controllable. For example, the busy periods of queueing 
systems under heavy traffic vary dramatically and so the occurrence of regeneration 
points can be quite unpredictable. 

In conclusion, simulation straddles the space between theoretical models and the 
real world. The basic introduction to simulation methods for queueing systems provides 
an excellent opportunity to illustrate the role of statistical techniques in the application 
of probability models to real world problems. The presence of transient effects and cor- 
relations in the observed data provide an excellent opportunity to emphasize the need 
to apply probability models and statistical tools with care. But we should end this book 
on a positive note: the availability of plentiful and inexpensive computing allows us to 
extend the reach of our theoretical and simulation models into new frontiers! 


SUMMARY 


e A queueing system is specified by the arrival process, the service time distribu- 
tion, the number of servers, the waiting room, and the queue discipline. 


e Little’s formula states that under very general conditions: The mean number in a 
system is equal to the product of the mean arrival rate and the mean time spent 
in the system. 

e In M/M/1, M/M/1/K, M/M/c, M/M/c/c, and M/M/co queueing systems, the num- 
ber of customers in the system is a continuous-time Markov chain. The steady 
state distribution for the number in the system is found by solving the global bal- 
ance equations for the Markov chain. The waiting time and delay distribution 
when the service discipline is first come, first served is found by using the arriving 
customer’s distribution. 

e If the arrival process in a queueing system is a Poisson process and if the cus- 
tomer interarrival times are independent of the service times, then the arriving 
customer’s distribution is the same as the steady state distribution of the queue- 
ing system. 
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e In M/G/1 queueing systems the arriving customer’s distribution and the depart- 
ing customer’s distribution are both equal to the steady state distribution of the 
queueing system. The steady state distribution for the number of customers in an 
M/G/1 system can be found by embedding a discrete-time Markov chain at the 
customer departure instants. 

e Burke’s theorem states that the output process of M/M/1, M/M/c, and M/M/co 
systems at steady state are Poisson processes, and that the departure instants 
prior to time t are independent of the state of the system at time t. As a result, 
feedforward combinations of queueing systems with exponential service times 
have a product-form solution. 

e Jackson’s theorem states that for networks of queueing systems with exponential 
service times and external Poisson input processes, the joint state pmf is of prod- 
uct form. If the network of queues is open, the marginal state pmf of each queue 
is the same as that of a queue in isolation that has Poisson arrivals of the same 
rate. If the network of queues is closed, finding the joint state pmf requires find- 
ing a normalization constant. The mean value analysis method allows us to find 
the mean number in each queue, the mean time spent in each queue, and the ar- 
rival rate in each queue in a closed network of queues. 

e Approaches to simulating queueing systems include replication, time sampling, 
and embedded Markov chains. The analysis of observations must deal with the 
effect of transient behavior as well as the correlation of observations. 


CHECKLIST OF IMPORTANT TERMS 


a/b/m/K M/M/1/K queueing system 

Arrival rate Offered load 

Arriving customer’s distribution Open networks of queues 

Burke’s theorem Pollaczek—Khinchin mean value formula 
Carried load Pollaczek—Khinchin transform equation 
Closed networks of queues Product-form solution 

Departing customer’s distribution Queue discipline 

Erlang B formula Regenerative method for simulation 
Erlang C formula Residual service time 

Finite-source queueing system Server utilization 

Head-of-line priority service Service discipline 

Interarrival times Service time 

Jackson’s theorem Simulation based on embedded Markov 
Lindley’s recursion chains 

Little’s formula Simulation through independent replica- 
Mean value analysis tion 

Method of batch means Time-sampled process simulation 

M/G/1 queueing system Total delay 

M/M/c queueing system Traffic intensity 

M/M/c/c queueing system Waiting time 


M/M/1 queueing system 
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PROBLEMS 
Sections 12.1 and 12.2: The Elements of a Queueing Network and Little’s Formula 


12.1. Describe the following queueing systems: M/M/1, M/D/1/K, M/G/3, D/M/2, G/D/1, D/D/2. 
12.2. Suppose that a queueing system is empty at time t = 0, let the arrival times of the first six 
customers be 1,3, 4,7, 8, 15, and let their respective service times be 3.5, 4, 2, 1, 1.5, 4. Find 
Si, Ti, Di, W;, and T; for i = 1,...,5; sketch M(t) versus t; and check Little’s formula by 
computing (N),, (A),, and (7), for each of the following three service disciplines: 
(a) First come, first served. 
(b) Last come, first served. 


(c) Shortest job first (assume that the precise service time of each job is known before it 
enters service). 

12.3. A data communication line delivers a block of information every 10 us. A decoder checks 
each block for errors and corrects the errors if necessary. It takes 1 us to determine 
whether a block has any errors. If the block has one error, it takes 5 us to correct it, and if 
it has more than one error it takes 20 us to correct the error. Blocks wait in a queue when 
the decoder falls behind. Suppose that the decoder is initially empty and that the num- 
bers of errors in the first ten blocks are 0, 1,3, 1,0, 4,0, 1, 0, 0. 
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(a) Plot the number of blocks in the decoder as a function of time. 
(b) Find the mean number of blocks in the decoder. 
(c) What percentage of the time is the decoder empty? 


12.4. Three queues are arranged in a loop as shown in Fig. P12.1. Assume that the mean service 
time in queue iis m; = 1/p;. 
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FIGURE P12.1 


(a) Suppose the queue has a single customer circulating in the loop. Find the mean time 
E[T] it takes the customer to cycle around the loop. Deduce from E[T] the mean ar- 
rival rate A at each of the queues. Verify that Little’s formula holds for these two 
quantities. 

(b) If there are N customers circulating in the loop, how are the mean arrival rate and 
the mean cycle time related? 

12.5. A very popular barbershop is always full. The shop has two barbers and three chairs for 
waiting, and as soon as a customer completes his service and leaves the shop, another en- 
ters the shop. Assume the mean service time is m. 

(a) Use Little’s formula to relate the arrival rate and the mean time spent in the shop. 

(b) Use Little’s formula to relate the arrival rate and the mean time spent in service. 

(c) Use the above formulas to find an expression for the mean time spent in the system 
in terms of the mean service time. 

12.6. In Problem 12.3, suppose that the probabilities of zero, one, and more than one errors are 
Po, Pi, and pz, respectively. Use Little’s formula to find the mean number of blocks in the 
decoder. 

12.7. A communication network receives messages from R sources with mean arrival rates 
A,,---, Ap. On the average there are E[N;] messages from source i in the network. 

(a) Use Little’s formula to find the average time E[7;] spent by type i customers in the 
network. 

(b) Let A denote the total arrival rate into the network. Use Little’s formula to find an 
expression for the mean time E[T] spent by customers (of all types) in the network 
in terms of the E[ N;]. 

(c) Combine the results of part a and part b to obtain an expression for E[T] in terms of 
E(T;]. Derive the same expression using A(t) the arrival processes for each type. 


Section 12.3: The M/M/1 Queue 


12.8. (a) Find P[N = n] for an M/M/1 system. 


(b) What is the maximum allowable arrival rate in a system with service rate u, if we re- 
quire that P[N = 10] = 10°? 


786 


Chapter 12 Introduction to Queueing Theory 


12.9. 


12.10. 


12.11. 


12.12. 


12.13. 


12.14. 
12.15. 


12.16. 


12.17. 


A decision to purchase one of two machines is to be made. Machine 1 has a processing 

rate of u transactions/hour and it costs B dollars/hour to operate whether idle or not; ma- 

chine 2 is twice as fast but costs twice as much to operate. Suppose that transactions ar- 

rive at the system according to a Poisson process of rate A and that the transaction 

processing times are exponentially distributed. The total cost of the system is the opera- 

tion cost plus a cost of A dollars for each hour a customer has to wait. 

(a) Find expressions for the total cost per hour for each of the systems. Plot this cost ver- 
sus the arrival rate. 

(b) If A = B/10, for what range of arrival rates is machine 1 cheaper? Repeat for 
A = 10B. 

Consider an M/M/1 queueing system in which each customer arrival brings in a profit of 

$5 but in which each unit time of delay costs the system $1. Find the range of arrival rates 

for which the system makes a net profit. 

Consider an M/M/1 queueing system with arrival rate à customers/second. 

(a) Find the service rate required so that the average queue is five customers (i.e., 
E[N] = 5). 

(b) Find the service rate required so that the queue that forms from time to time has 
mean 5 (i.e., E[N; | N; > 0] = 5). 

(c) Which of the two criteria, E[N,] or E[N; |N; > 0], do you consider the more ap- 
propriate? 

Show that the pth percentile of the waiting time for an M/M/1 system is given by 


nee 1/u in p ) 
1-p \l-p/ 


Consider an M/M/1 queueing system with service rate two customers per second. 


(a) Find the maximum allowable arrival rate if 90% of customers should not have a 
delay of more than 3 seconds. 

(b) Find the maximum allowable arrival rate if 90% of customers should not have to 
wait for service for more than 2 seconds. Hint: Use the result from Problem 12.12, 
and then find à by trial and error. 

Verify Eq. (12.36) for the steady state pmf of an M/M/1/K system. 

Consider an M/M/1/2 queueing system in which each customer accepted into the system 

brings in a profit of $5 and each customer rejected results in a loss of $1. Find the arrival 

rate at which the system breaks even. 

For an M/M/1/K system show that 

P[N =k] 


Why does this probability represent the proportion of arriving customers who actually 
enter the system and find exactly k customers in the system? 


(a) Use the matrix exponential method of Eq. (11.72) to find the transient solution for 
the state pmfs for an M/M/1/5 queue under the following conditions: 


(i) p = 0.5 and N(0) = 0, N(0) = 2, N(O) = 5; 
(ii) p = land N(0) = 0, N(0) = 2, N(O) = 5. 
(b) Plot E[N(d)] vs. t for the cases considered in part a. 


12.18. 


12.19. 


Problems 787 


Suppose that two types of customers arrive at a queueing system according to independent 

Poisson process of rate 4/2. Both types of customers require exponentially distributed ser- 

vice times of rate u. Type 1 customers are always accepted into the system, but type 2 cus- 

tomers are turned away when the total number of customers in the system exceeds K. 

(a) Sketch the transition rate diagram for N(f), the total number of customers in the system. 

(b) Find the steady state pmf of N(f). 

Consider the queueing system in Problem 12.18 with K = 5 and with a maximum sys- 

tem occupancy of 10 customers. In this problem we use the matrix exponential method 

of Eq. (11.72) to explore how the system adjusts to sudden increases in load. 

(a) Find the transient state pmf for the system with A = 1/2 and u = 1, assuming that 
initially there are 5 customers in the system. 

(b) Suppose that at time 20, the A increases to 1. Find the transient state pmf after this 
surge in traffic. 


Section 12.4: Multiserver Systems: M/M/c, M/M/c/c, and M/M/co 


12.20. 
12.21. 


12.22. 


12.23. 


12.24. 


12.25. 


Find P[N = c + k] for an M/M/c system. 

Customers arrive at a shop according to a Poisson process of rate 12 customers per hour. 

The shop has two clerks to attend to the customers. Suppose that it takes a clerk an expo- 

nentially distributed amount of time with mean 5 minutes to service one customer. 

(a) What is the probability that an arriving customer must wait to be served? 

(b) Find the mean number of customers in the system and the mean time spent in the 
system. 

(c) Find the probability that there are more than 4 customers in the system. 

Little’s formula applied to the servers implies that the mean number of busy servers is AE[ 7]. 

Verify this by explicit calculation of the mean number of busy servers in an M/M/c system. 

Inquiries arrive at an information center according to a Poisson process of rate 10 in- 

quiries per second. It takes a server 1/2 second to answer each query. 

(a) How many servers are needed if we require that the mean total delay for each inquiry 
should not exceed 4 seconds, and 90% of all queries should wait less than 8 seconds? 

(b) What is the resulting probability that all servers are busy? Idle? 

Consider a queueing system in which the maximum processing rate is cu customers per 

second. Let k be the number of customers in the system. When k = c, c customers are 

served at a rate u each. When 0 < k = c, these k customers are served at a rate cu/k 
each. Assume Poisson arrivals of rate à and exponentially distributed times. 

(a) Find the transition rate diagram for this system. 

(b) Find the steady state pmf for the number in the system. 

(c) Find E[W] and E[T]. 

(d) For c = 2, compare E[W] and E[7] for this system to those of M/M/1 and M/M/2 
systems of the same maximum processing rate. 

(a) Suppose that the queueing system in Problem 12.24 models a Web server where c is 
the maximum number of clients allowed to place queries at the same time. Discuss 
the impact of the choice of the parameter c on queueing and total delay performance. 

(b) Consider the fact that while connected to the Web server, clients spend their time 
in three states: sending the query, waiting for the response, and thinking after each 
response. How does this affect the choice of c? Should the system impose a time- 
out limit on the customer’s connection time? 
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12.26. 


12.27. 


12.28. 


12.29. 


12.30. 


12.31. 


12.32. 


12.33. 


Show that the Erlang B formula satisfies the following recursive equation: 


where a = AE[T]. 
Consider an M/M/5/5 system in which the arrival rate is 10 customers per minute and the 
mean service time is 1/2 minute. 
(a) Find the probability of blocking a customer. Hint: Use the result from the Problem 12.26. 
(b) How many more servers are required to reduce the blocking probability to 10%? 
A tool rental shop has four floor sanders. Customers for floor sanders arrive according to 
a Poisson process at a rate of one customer every two days. The average rental time is ex- 
ponentially distributed with mean two days. If the shop has no floor sanders available, the 
customers go to the shop across the street. 
(a) Find the proportion of customers that go to the shop across the street. 
(b) What is the mean number of floor sanders rented out? 
(c) What is the increase in lost customers if one of the sanders breaks down and is not 

replaced? 
(a) Show that the Erlang C formula is related to the Erlang B formula by 

cB(c, a) 


C8) Sy Sglt= Beak forc > a. 


(b) Show that this implies that C(c, a) > B(c, a). 
Suppose that department A in a certain company has three private videoconference lines 
connecting two sites. Calls arrive according to a Poisson process of rate 1 call/hour, and 
have an exponentially distributed holding time of 2 hours. Calls that arrive when the 
three lines are busy are automatically redirected to public video lines. Suppose that de- 
partment B also has three private videoconference lines connecting the same sites, and 
that it has the same arrival and service statistics. 

(a) Find the proportion of calls that are redirected to public lines. 

(b) Suppose we consolidate the videoconference traffic from the two departments and 
allow all calls to share the six lines. What proportion of calls are redirected to public 
lines? 

A c = 10 server blocking system handles two streams of customers that each arrive at 

rate A/2. Type 1 customers have a mean service time of 1 time unit, and Type 2 customers 

have a service time of 3 time units. Compare the blocking performance of a system that 
allows customers to access any available server against one that allocates half the servers 

to each class. Does scale matter? Does the answer change if c = 100? 

Suppose we use P| N = c] from an M/M/œ system to approximate B(c, a) in selecting the 

number of servers in an M/M/c/c system. Is the resulting design optimistic or pessimistic? 

During the evening rush hour, users log onto a peer-to-peer network at a rate of 10 users 

per second. Each user stays connected to the network an average of 1 hour. 

(a) What is the steady state pmf for the number of customers logged onto the peer-to- 
peer network? 


(b) Is steady state ever achieved? 


(c) Is it reasonable to assume a Gaussian distribution for the number of customers in 
the system? 


Problems 789 


Section 12.5: Finite-Source Queueing Systems 


12.34. 


12.35. 


12.36. 


12.37. 
12.38. 


12.39. 


A computer is shared by 15 users as shown in Fig. 12.14(b). Suppose that the mean service 
time is 2 seconds and the mean think time is 30 seconds, and that both of these times are 
exponentially distributed. 


(a) Find the mean delay and mean throughput of the system. 
(b) What is the system saturation point K* for this system? 
(c) Repeat part a if 5 users are added to the system. 


A Web server that has the maximum number of clients connected is modeled by the sys- 
tem in Figure 12.14(b). Suppose that the system can handle a query in 10 milliseconds and 
the users click new queries at a rate of 1 every 5 seconds. 


(a) Find the value of K* for this system. 
(b) Find the pmf for the number of requests found in queue by arriving queries. 


Find the transition rate diagram and steady state pmf for a two-server finite-source 
queueing system. 

Verify that Eqs. (12.84) and (12.81) give E[7] as given in Eq. (12.72). 

Consider a c-server, finite-source queueing system that allows no queueing for service. 
Requests that arrive when all servers are busy are turned away, and the corresponding 
source immediately returns to the “think” state, and spends another exponentially dis- 
tributed think time before submitting another request for service. 


(a) Find the transition rate diagram and show that the steady state pmf for the state of 


the system is 
K\ , 
( Jya =p)! 
J 


P[N = j] = i =0,...,C, 


=(“)ru - p)“ 


i=0\ 2 


where c is the number of servers, K is the number of sources, and 


a/u 


aa ee 


(b) Find the probability that all servers are busy. 

(c) Use the fact that arriving customers “see” the steady state pmf of a system with one 
less source to show that the fraction of arrivals that are turned away is given by 
Px_1(c). The resulting expression is called the Engset formula. 

A video-on-demand system is modeled as a c = 10 server system that handles video 

chunk requests from K clients. Suppose that the system is modeled by the Engset system 

from Problem 12.38. Suppose that users generate requests at a rate of one per second and 
the each server can meet the request within 100 ms. Find the number of clients that can be 

connected if the probability of turning away a request is 10%? 1%? 


Section 12.6: M/G/1 Queueing Systems 


12.40. 


Find the mean waiting time and mean delay in an M/G/1 system in which the service time 
is a k-Erlang random variable (see Table 4.1) with mean 1/u. Compare the results to 
M/M/1 and M/D/1 systems. 
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12.41. A k = 2 hyperexponential random variable is obtained by selecting a service time at ran- 


dom from one of two exponential random variables as shown in Fig. P12.2. Find the mean 
delay in an M/G/1 system with this hyperexponential service time distribution. 


Hı 


O 


FIGURE P12.2 


12.42. Customers arrive at a queueing system according to a Poisson process of rate À. A frac- 


tion a of the customers require a fixed service time d, and a fraction 1 — a require an ex- 
ponential service time of mean 1/. Find the mean waiting time and mean delay in the 
resulting M/G/1 system. 


12.43. Find the mean waiting time and mean delay in an M/G/1 system in which the service time 


consists of a fixed time d plus an exponentially distributed time of mean 1/y. 


12.44. Fixed-length messages arrive at a transmitter according to a Poisson process of rate À. 


The time required to transmit a message and to receive an acknowledgment is d seconds. 
If a message is acknowledged as having been received correctly, then the transmitter pro- 
ceeds with the next message. If the message is acknowledged as having been received in 
error, the transmitter retransmits the message. Assume that a message undergoes errors 
in transmission with probability p, and that transmission errors are independent. 


(a) Find the mean and variance of the effective message service time. 
(b) Find the mean message delay. 


12.45. Packets at a router with a 1 Gigabit/second transmission line arrive at a rate of A pack- 


ets per second. Suppose that half the packets are 40 bytes long and half the packets are 
1500 bytes long. Find the mean packet delay as a function of A. 


12.46. A file server receives requests at a rate of A requests per second. The server can transmit 


files at a rate of 12.5 Megabytes per second. Suppose that file lengths have a Pareto dis- 
tribution with mean 1 Megabyte. 


(a) Find the average delay in meeting a file request. 
(b) Discuss the effect of the Pareto distribution parameter on system performance. 


12.47. Jobs arrive at a machine according to a Poisson process of rate A. The service times for 


the jobs are exponentially distributed with mean 1/y. The machine has a tendency to 
break down while it is serving customers; if a particular service time is t, then the proba- 
bility that it will break down k times during this service time is a Poisson random variable 
with mean at. It takes an exponentially distributed time with mean 1/8 to repair the ma- 
chine. Assume a machine is always working when it begins a job. 


(a) Find the mean and variance of the total time required to complete a job. Hint: Use 
conditional expectation. 


(b) Find the mean job delay for this system. 


12.48. 


12.49, 


12.50. 


12.51. 


12.52. 


Problems 791 


Consider a two-class nonpreemptive priority queueing system, and suppose that the 

lower-priority class is saturated (i.e., AyE[7,] + A.E[ 72] > 1). 

(a) Show that the rate of low-priority customers served by the system is 
Mb = (1 — AYE[7,])/E[ 72]. Hint: What proportion of time is the server busy with 
class two customers? 


(b) Show that the mean waiting time for class 1 customers is 
(1/2)ME[ri] E[r] 
1- Efn]  2Elr] 


1 


Consider an M/G/1 system in which the server goes on vacations (becomes unavailable) 
whenever it empties the queue. If upon returning from vacation the system is still empty, 
the server takes another vacation, and so on until it finds customers in the system. Sup- 
pose that vacation times are independent of each other and of the other variables in the 
system. Show that the mean waiting time for customers in this system is 


(1/2)AE[77] E[V?] 
~ 1-AE[rt]  2E[V] 


where V is the vacation time. Hint: Show that this system is equivalent to a nonpreemp- 
tive priority system and use the result of Problem 12.48. 

Fixed-length packets arrive at a concentrator that feeds a synchronous transmission sys- 
tem. The packets arrive according to a Poisson process of rate A, but the transmission sys- 
tem will only begin packet transmissions at times id, i = 1,2,..., where d is the 
transmission time for a single packet. Find the mean packet waiting time. Hint: Show that 
this is an M/D/1 queue with vacations as in Problem 12.49. 

A queueing system handles two types of traffic. Type i traffic arrives according to a Pois- 
son process and has exponentially distributed service times with mean 1/y; for i = 1, 2. 
Suppose that type 1 customers are given nonpreemptive priority. Plot the overall and per- 
class mean waiting time versus A if Ay = Ay = A, u = 1, wo = 1/10. 


Consider a two-class priority M/G/1 system in which high-priority customer arrivals pre- 
empt low-priority customers who are found in service. Preempted low-priority customers 
are placed at the head of their queue, and they resume service when the server again be- 
comes available to low-priority customers. 


(a) What is the mean waiting time and the mean delay for the high-priority customers? 


(b) Show that the time required to service all customers found by a type 2 arrival to the 
system is 
Ry 


1 — p, — py’ 


where p; = A,E[7;], and 
12 ; 
j=l 


(c) Show that the time required to service all type 1 customers who arrive during the 
time a type 2 customer spends in the system is p,E[7>]. 
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Use parts b and c to show that 


(1 — pı — p2)/Mz + R 


a er es crear 


12.53. Evaluate and plot the formulas developed in Problem 12.52 using the two traffic classes 
described in Problem 12.51. 


Section 12.7: M/G/1 Analysis Using Embedded Markov Chain 


12.54. The service time in an M/G/1 system has a k = 2 Erlang distribution with mean 1/u and 
A= p/2. 


(a) 


(b) 
12.55. (a) 


(b) 
12.56. (a) 


(b) 
(c) 


12.57. (a) 


(b) 


12.58. (a) 


Find Gy(z) and P[N = j]. 
Find W(s) and T(s) and the corresponding pdf’s. 


In Problem 12.47, show that the Laplace transform of the pdf for the total time 7 re- 
quired to complete the service of a customer is 


a(s + B) 
(s + B)(s + u) + as` 


î(s) = 


Hint: Use conditional expectation in evaluating E[e*’], and note that the number of 
breakdowns depends on the service time of the customer. 


Find W(s) and T(s) and the corresponding pdf’s. 
Show that Eqs. (12.110a) and (12.110b) can be written as 


where 


Take the expected value of both sides of Eq. (12.186) to obtain an expression for 
P[N > 0]. 

Square both sides of Eq. (12.186) and take the expected value to obtain the 
Pollaczek—Khinchin formula for E[N]. 


Show that for an M/D/1 system, 


(1 — p)(1 = 2) 
OMe) Te 


Expand the denominator in a geometric series, and then identify the coefficient of z* 
to obtain 


-jp) i- }{-jp — k + Lei 
(k= j)! i 


= 
PIN = k]= (1-9) 
= 
Show that Eq. (12.130) can be rewritten as 
A k= 
W = P 


(s) = ——_,, (12.87) 
1 — pR(s) 
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where 
x 1-— 7(s 
ito = Sa) 


is the Laplace transform of the pdf of the residual service time. 
(b) Expand the denominator of Eq. (12.187) in a geometric series and invert the result- 
ing transform expression to show that 


co 


fw(x) = DA - p)p* f(x), (12.188) 


k=0 
where f% (x) is the kth-order convolution of the residual service time. 


12.59. Approximate fy(x) for an M/D/1 system using the k = 0,1,2 terms of Eq. (12.188). 
Sketch the resulting pdf for p = 1/2. 


Section 12.8: Burke’s Theorem: Departures from M/M/c Systems 


12.60. Consider the interdeparture times from a stable M/M/1 system in steady state. 
(a) Show that if a departure leaves the system nonempty, then the time to the next de- 
parture is an exponential random variable with mean 1/p. 
(b) Show that if a departure leaves the system empty, then the time to the next depar- 
ture is the sum of two independent exponential random variables of means 1/A and 
1/p, respectively. 
(c) Combine the results of parts a and b to show that the interdeparture times are expo- 
nential random variables with mean 1/4. 
12.61. Find the joint pmf for the number of customers in the queues in the network shown in 


Fig. P12.3. 
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FIGURE P12.3 


12.62. Write the balance equations for the feedforward network shown in Fig. P12.4 and verify 
that the joint state pmf is of product form. 
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FIGURE P12.4 
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12.63. Verify that Eqs. (12.137) through (12.139) satisfy Eq. (12.135). 


Section 12.9: Networks of Queues: Jackson’s Theorem 


12.64. Find the joint state pmf for the open network of queues shown in Fig. P12.5. 
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FIGURE P12.5 


12.65. A computer system model has three programs circulating in the network of queues 
shown in Fig. P12.6. 
(a) Find the joint state pmf of the system. 
(b) Find the average program completion rate. 
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FIGURE P12.6 


12.66. Use the mean value analysis algorithm to answer Problem 12.65, part b. 


Section 12.10: Simulation and Data Analysis of Queueing Systems 


12.67. (a) Repeat the experiment in Example 12.28 for an M/M/1 system with p = 0.5, 0.7, and 
0.9. Use sample means for N(t) based on 25 replications to characterize the transient 
behavior. Try out smoothing the sample means using a moving average filter over 
time. Give an estimate of the time to reach steady state in each of these systems. 

(b) Now investigate the effect of initial condition on the duration of the transient phase. 
For each of the utilizations above compare the transient duration when the initial 
condition is: N(0) = 0; N(0) = 5; N(0) = 10. 


12.68. 


12.69. 


12.70. 


12.71. 


12.72. 


12.73. 


12.74. 


12.75. 


12.76. 


12.77. 


12.78. 


12.79. 
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For the experiment in Problem 12.67, calculate the sample covariance for each realization 
and then average over the 25 replications. Find the number of lags required for each 
value of r until the correlation drops to zero. Comment on the implications for the size of 
the batches if a method of batch means approach is to be used. 

The correlation of N(t) for an M/M/1 system has the following geometric upper bound 
[Fishman]: 


4p | . 
P; S (1+ py for j= 0,1,2,.... 


Evaluate the ratio of the variance of the sample mean estimator for this process to that of 

an iid process when p = 0.5, 0.75, 0.9, 0.99. 

Run the simulation for the experiment in Example 12.29 50 times. For each simulation 

produce a confidence interval using the method of batch means. Determine the fraction 

of the confidence intervals that covered the actual mean E[N]. Comment on the accuracy 

of the confidence intervals given by Eq. (12.168). 

Develop a simulation model for an M/M/3 system with A = 2 customers per second and 

u = 1 customer per second. Use the method of batch means as in Example 12.29 to esti- 

mate the probability that an arriving customer has to queue for service. Provide appro- 

priate confidence intervals. 

(a) Consider the simulation in Example 12.30 where the embedded Markov chain ap- 
proach is used to estimate the steady state pmf. For p = 0.5 and p = 0.9, use different 
warm-up periods to investigate the effect of the initial transient on the pmf estimates. 

(b) Double the number of replications and observe the impact on the confidence 
intervals. 

Develop a simulation for an M/D/1 system with p = 0.7 using the embedded Markov 

chain in Eq. (12.172). Design the simulation to estimate the pmf for the number of cus- 

tomers in the system as well as the mean number in the system. 

(a) Discuss what transient effects can be expected in this approach. 

(b) Use the method of batch means to develop estimates for the mean number of cus- 
tomers in the system. Discuss the choice of batch size and warm-up period. Evaluate 
the confidence intervals produced by several realizations. 

Use Lindley’s recursion to estimate the waiting-time distribution for customers in an 

M/D/1 system with p = 0.5 and p = 0.7. Is there anything peculiar about the distribution? 

Use Lindley’s recursion to estimate the waiting-time distribution for customers in a 

D/M/1 system with p = 0.5 and p = 0.7. 

Use Lindley’s recursion to estimate the waiting-time distribution for customers in an 

M/G/1 system with p = 0.5 and p = 0.7 where the service-time distribution is Pareto 

with parameter a = 2.5. Try a simulation with a = 1.5. Does anything peculiar happen? 

Repeat the experiment in Example 12.33, but use the method of batch means to provide 

confidence intervals for the mean waiting time. 

Explain why the estimator in Eq. (12.183) will converge to the expected value of the wait- 

ing time. 

Use the regenerative method to estimate the mean number in the system and the proba- 

bility that the system is empty in an M/D/1 system. Evaluate the confidence interval pro- 

vided by Eq. (12.185). 
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Problems Requiring Cumulative Knowledge 


12.80. 


12.81. 


12.82. 


12.83. 


12.84. 


Consider an M/M/2/2 system in which one server is twice as fast as the other server. 

(a) What definition of “state” of the system results in a continuous-time Markov chain? 

(b) Find the steady state pmf for the system if customers arriving at an empty system are 
always routed to the faster server. 

(c) Find the steady state pmf for the system if customers arriving at an empty system are 
equally likely to be routed to either server. 

(a) Find the transient pmf, P[N(t) = j], for an M/M/1/2 system which is in the empty 
state at time 0. 

(b) Repeat part a if the system is full at time 0. 

(a) In an M/G/1 system, why are the set of times when customers arrive to an empty sys- 
tem renewal instants? 

(b) How would you apply the results from renewal theory in Section 7.5 to estimate the 
pmf for the number of customers in the system? 

(c) How would you obtain a confidence interval for P[ N(t) = j]? 

Let N(t) be a Poisson random process with parameter A. Suppose that each time an event 

occurs, a coin is flipped and the outcome is recorded. Assume that the probability of 

heads depends on the time of the arrival and is denoted by p(t). Let N,(t) and N(t) de- 

note the number of heads and tails recorded up to time t, respectively. 


(a) Show that N,(t) and N,(t) are independent Poisson random variables with rates pA 
and (1 — p)A, where 


(b) Are N,(t) and N,(t) independent Poisson random processes? If so, how would you 
show this? 

Consider an M/G/œ system in which customers arrive at rate A and in which the cus- 

tomer service times have distribution F(x). Suppose that the system is empty at time 0. 

Let N; (t) be the number of customers who have completed their service by time t, and let 

N,(t) be the number of customers still in the system at time t. 

(a) Use the result of Problem 12.83 to find the joint pmf of N,(t) and N,(t). 

(b) What is the steady state pmf for the number of customers in an M/G/co system? 

(c) Apply Little’s formula to compute the average number of customers in the system. 
Is the result consistent with your result in part b? 


APPENDIX 


Mathematical Tables A 


A. TRIGONOMETRIC IDENTITIES 
sin? a + cos*a = 1 


sin(a + B) = sinacos B + cosasin B 


sin(a — B) = sinacos B — cosasin B 
cos(a + B) = cos «æ cos B — sina sin B 
cos(a — B) = cos «æ cos B + sina sin B 
sin 2a = 2 sina cos q 


cos 2a = cos? a — sin? a = 2 cos? a —1=1-2sira 


sin a sin B = $eos(a - B)- cosa + B) 
1 1 

cos a cos B = 73 C0s(a — B)+ 73 cos(a + B) 

1. 1. 

sin a cos B = 3 sin(a + B) + 3 Sin(a — B) 


cos æ sin B = sin(a + B) — sin(a — B) 


1 
sin? a = 5 (1 — cos 2a) 


1 
cos? a = 5 (1 + cos 2a) 


el = cosa + jsina 
cos a = (e/* + e/*)/2 
sin a = (e — e!*)/2j 


sina = cos(a — 77/2) 
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B. INDEFINITE INTEGRALS 


fe dv = uv — f du where u and v are functions of x 
x” dx = x"t!/(n +1) exceptforn = —1 
x!dx=lnx 

e™ dx = e/a 

Inxdx = xlnx -x 

(a? + x’)! dx = (1/a) tan” (x/a) 

(In x)"/x dx = (1/(n + 1))(In x)”*! 

x" In ax dx = (x"*!/(n + 1)) Inax — x"*4/(n + 1} 
xe™ dx = e™(ax — 1)/a* 

xe dx = e™(a’x? — 2ax + 2)/a? 

sin ax dx = —(1/a) cos ax 

cos ax dx = (1/a) sin ax 

sin’ ax dx = x/2 — sin(2ax)/4a 

x sin ax dx = (1/a’)(sin ax — ax cos ax) 


2 


x’ sin ax dx = {2ax sin ax + 2 cos ax — a*x’ sin ax}/a? 


cos? ax dx = x/2 + sin(2ax)/4a 


x cos ax dx = (1/a’)(cos ax + ax sin ax) 


2 


x’ cos ax dx = (1/a°){2ax cos ax — 2 sin ax + a’x? 


sin ax} 


ee ee a Be ee a a Se ee ee Ge Ã 


C. 


C. Definite Integrals 


DEFINITE INTEGRALS 


= T(n 
tlet) dt = LEUE n>0,a>-1 
0 (a + 1)” 
I(n) = (n — 1)! if n is an integer, n > 0 
1 
1-3-5---(2n -—1 
r(n+3)- . hive n = 1,2,3,... 


ee? dy = Vr/2a 


3 


xe" dx = 1/2a* 


3 


ve" dx = Vir /400 


xet? dy =T((n + 1)/2)/(20"*1) 


8 


aj (aœ + x?) dx = 1/2 ifa > 0 


8 


sin’ ax 
pe 


dx = |a|m/2 ifa>0O 


Be 


F(a) (b) 


Eames = xy dx = Bia, b) = T(a + b) 


SS, BE Sas ia Ea 
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APPENDIX 


Tables of Fourier B 
Transforms 


A. FOURIER TRANSFORM DEFINITION 


Gif) = Heo} = | ee P ae 


st) = FG) = f CP af 


B. PROPERTIES 
Linearity: F{ag(t) + bg(t)} = aGy(f) + bG(f) 
Time scaling: Fi g(at)} = G(f/a)/lal 
Duality: If F{g(t)} = G(f), then F{G(t)} = g(—-f) 
Time shifting: F{ g(t — to)} = G(f)e Prt 
Frequency shifting:  Ff{g(t)e?™"} = G(f — fo) 
Differentiation: F{g'(t)} = j2rfG(f) 
Integration: ay f stows = G(f)/(i2mf) + (G(0)/2)8(f) 


Multiplication in time: F{ g:(t)g.(t)} = Gi(f) * G(f) 
Convolution in time: Fg (t) * g(t)} = Gi(f)G/(f) 
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C. Transform Pairs 


TRANSFORM PAIRS 


g(t) G(f) 
1 
= i 7 >t 27 sin 2afT/A27fT) 
1 
2W sin(2aW2)/20 Wt l -f 
—w o0 W 
1 
ca vt  T(sin(afT)/nf TY 
e“u(t), a>0 I/(a + j2af) 
eal a>0O 2a/(a* + (27f)’) 
em et 
(t) 1 
1 ô(f) 
8(t — to) eTa 
eP7fo ôlf = fo) 
cos(2mfot SOF- fa) +F + fi) 
sin(27r fot) (1/2j/){8(f — fo) = 6(f + fo)} 


es Z8) + W/(2xf) 
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Matrices and Linear 
Algebra 


802 


BASIC DEFINITIONS 


Let A = [a;;] be an m row by n column matrix with element a;; in the ith row and jth 
column. A matrix is square if m = n. 

The transpose of A is the n row by m column matrix AT = [a;]" which has ele- 
ment aj; in the jth row and ith column, and which is obtained by interchanging the rows 
and columns of A. The transpose of the product of matrices is equal to the product of 
the transposes in reverse order: 


(AB)! = B'AT and (ABC)! = C'BIAT. 


The identity matrix I is a square matrix whose diagonal elements equal 1 and off-diag- 
onal elements equal zero. For any square matrix A: 


AI=ITA=A. 
The inverse of a square matrix A is a square matrix A! for which 
AA! = A'A =L. 


We say that A is invertible if A! exists, and singular otherwise. 


DIAGONALIZATION 
A nonzero vector e = (e4, €2,..., €,)! is an eigenvector of ann X n matrix if it satisfies: 
Ae = Ae 


for some scalar A. A is called an eigenvalue of A and e an eigenvector of A correspond- 
ing to A. 
The eigenvalues of A are found by finding the roots of the polynomial equation: 


det(AI — A) = 0. 


An n X n matrix A is said to be diagonalizable if there exists an invertible matrix P 
such that P'AP = D, a diagonal matrix, or equivalently AP = P D. 


C. Quadratic Forms 803 


Theorem: 


A is diagonalizable if and only if A has n linearly independent eigenvectors. 

A square matrix P is orthogonal if P™! = P", or equivalently, AAT = ATA = I. 

A set of vectors {e), €2,...,€,} is said to be orthonormal if distinct vectors are orthogo- 
nal, that is, e;'e; = 0 fori # j, and e;'e; = 1 fori = 1,...,n. 


Theorem: 


If the set of vectors {e,,e2,..., €„} are nonzero and orthogonal then they are also linearly inde- 
pendent. 

Ann X n matrix A is said to be orthogonally diagonalizable if there exists an orthogonal 
matrix P such that PTAP = D, a diagonal matrix, or equivalently AP = P D. 

Ann X n matrix A is symmetric if A = A’. 


Theorem: 


A symmetric matrix A has only real eigenvalues. 


Theorem: 


The following conditions are equivalent: 
a. A is orthogonally diagonalizable, 
b. A has an orthonormal set of n eigenvectors, 
€. A is asymmetric matrix. 


QUADRATIC FORMS 


The n X n real symmetric matrix A and the n X 1 column vector x = (x1, X2,..., Xp)" 


have the quadratic form given by: 


n n 
xTAx = > X aijxix;- 


i=1 j=l 


A is nonnegative definite ifx'Ax = 0 for all x, and positive definite ifx'Ax > 0 for all 
nonzero x. 

Let A = [a;j] be an n X n matrix, then the kth principal submatrix of A is the 
k X k matrix A; = [a;;] with element aj; in the ith row and jth column. 


Theorem: 


A symmetric matrix A is positive definite (nonnegative definite) if and only if 


a. All eigenvalues are positive (nonnegative) and 
b. The determinant of all principal submatrices are positive (nonnegative). 


If A is a positive definite matrix then x'A !x = 1 is the equation of an ellipsoid with center 


at the origin. The kth semiaxis of the ellipsoid is given by e,/ Vs that is, the eigenvectors deter- 
mine the direction of the semiaxes and the eigenvalues determine the corresponding length. 
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A 


Almost-sure convergence, 381-382, 385 
Amplitude modulation (AM): 
bandpass signal, 602 
quadrature amplitude modulation (QAM) 
method, 603-604 
by random signals, 601-605 
Aperiodic state, 667 
ARMA random process, 595-596 
Arrival rate, 714 
Arrival theorem, 766-770 
proof of, 769-770 
Associative properties, 28 
Autocorrelation function, 494—495 
Autocovariance function, 494-495 
Autoregressive moving average (ARMA) 
process, 595-596 
Autoregressive processes, 595 
random, 507 
Average power, 522,579 
Axioms of probability, 21, 30-41, 79 
continuous sample spaces, 37-41 
discrete sample spaces, 35-37 


B 


Bandlimited random processes, 597-605 
amplitude modulation by random 
signals, 601-605 
sampling of, 597-601 
Bandpass signal, 602 
amplitude modulation (AM), 602 
Bartlett’s smoothing procedure, 628 
Batch means: 
confidence intervals using, 775-776 
method of, 775-776 
Bayes estimation, 461-462 
Bayes hypothesis testing, 455—460 
binary communications, 457-458 
MAP receiver for, 458—459 
minimum cost hypothesis test, 457 
server allocation, 459-460 
Bayes’ rule, 52-53, 79 


Bayesian decision methods, 455—462 
Bayes hypothesis testing, 455—460 
minimum cost theorem 

proof of, 460-461 

Bernoulli random variables, 102 
coin toss, 117 
estimation, 428 

of p for, 421 
Fisher information for, 424—425 
mean of, 105 
properties of, 115 
variance of, 110 

Bernoulli trials, 60 
and binomial probabilities, 70 
estimating p in, 461-462 

Beta random variables, 165, 172-173 
generating, 198 

Bias, estimators, 416 

Binary communication system, 50, 52 

Binary random variable: 
entropy of, 203-205 

Binary transmission system, probabilities of 

input-output pairs in, 50 

Binomial counting process, 493, 501-502 
independent and stationary increments of, 504 
joint pmf of, 505 
Markov chains, 663 
transient state, 666 

Binomial probability law, 60-62 

Binomial random variables, 103 
Chernoff bound for, 375-377 
coin toss, 118 
defined, 117-118 
mean of, 118-119 
negative 

properties of, 116 
properties of, 115 
redundant systems, 119 
sampling distribution of, 414-415 
three coin tosses and, 105 
variance of, 119 

Binomial theorem, 61-62 

Birth-and-death process, 682-683 

Borel fields, 30, 38, 75-77 
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Brownian motion, 517 
Burke’s theorem, 754-758 
proof of, using time reversibility, 757-758 


(0 


Cauchy random variables, 165, 173 
Causal filters, 615 
estimation using, 614—617 
Causal system, 588 
Central limit theorem, 167fn, 369-378 
Chernoff bound for binomial random 
variable, 375-377 
Gaussian approximation for binomial 
probabilities, 373-375 
proof of, 377-378 
Certain event, 24 
Chapman-Kolmogorov equations, 654, 677 
Characteristic function, 184-187 
for an exponentially distributed random variable, 185 
for a geometric random variable, 185 
Chebyshev inequality, 181-183 
Chernoff bound, 183 
for binomial random variable, 375-377 
for Gaussian random variable, 187 
Chi-square goodness-of-fit test, 465 
Chi-square random variable, 170 
Chi-square test, 463-468 
for exponential random variable: 
equal-length intervals (table), 467 
equiprobable intervals (table), 468 
for Poisson random variable (table), 468 
Circuit theory, 4 
Circuit theory models, 4 
Classes of states, 660-662 
Closed networks of queues, 763-766 
Combinatorial formulas, 41, 44,79 
Communication over unreliable channels, 12-13 
Communication system design, 9 
Commutative properties, 28 
Complement, of a set, 27 
Complement operation, 27 
Composite hypotheses, testing, 449-455 
Compression of signals, 13 
Computer simulation models, 3-4, 79 
Conditional cdf’s, 152-155 
Conditional expectation, 268-271, 336 
Conditional pdf's, 153-155, 307 
Conditional pmf’s, 306 
Conditional probability, 21, 47-53, 79, 261-268 
Conditional probability mass function, 111-114 
conditional expected value, 113-114 
device lifetimes, 114 
device lifetimes, 113 
random clock, 112 
residual waiting times, 112-113 


Conditional variance of X given B: 
defined, 114 
Confidence intervals, 430-441 
batch means method (example), 435 
cases, 431-435 
confidence level, 431 
and hypothesis testing, 455 
for the variance of a Gaussian random 
variable, 436—437 
Consistent estimators, 418 
Continuity of probability, 76-77 
Continuous random variables, 146-149, 163-174 
beta, 165, 172-173 
calculating distributions using the discrete Fourier 
transform, 398—400 
Cauchy, 165, 173 
exponential, 163-167 
gamma, 164, 170-172 
Gaussian, 164, 167-170 
Laplacian, 165 
Pareto, 165, 173-174 
mean and variance of, 174 
Rayleigh, 165 
two, joint pdf of, 248-254 
uniform, 163 
Continuous sample spaces, 24, 37-41, 79 
Continuous-time Gaussian random processes, 516 
Continuous-time Markov chains, 673-686, 690-691 
global balance equations, 680-683 
birth-and-death process, 682-683 
homogeneous transition probabilities, 673-674 
limiting probabilities for, 683-686 
mean state occupancy time, 675 
Poisson process, 674, 678-679 
queueing system, 678 
M/M/1 single-server queueing system, 681-683 
random telegraph signal, 674 
simulation of, 698-700 
state occupancy times, 675 
steady state probabilities and, 680-683 
transition rates and time-dependent state 
probabilities, 676-679 
Continuous-time random processes: 
power spectral density, 578-583 
random telegraph signal, 580 
sinusoid with random phase, 580-581 
sum of two processes, 582-583 
white noise, 581-582 
Continuous-time stochastic process, defined, 488 
Continuous-time systems: 
filtered white noise, 590 
response to random signals, 587-593 
transfer function, 588 
Convergence: 
almost-sure, 381-382, 385 
Cauchy criterion, 384 


in distribution, 387 
mean square convergence, 384 
in probability, 384-385 
sure, 381 
Correlated Gaussian random variables, generation of, 
631-632 
Correlated vector random variables, generation of, 
342-345 
Correlation, 258 
Correlation coefficient, 259, 494 
Correlation matrix, 319 
Cost accumulation rate, 390-392 
Covariance, 258 
Covariance matrix, 319 
diagonalization of, 324-325 
generating random vectors with, 342-344 
Cramer-Rao inequality, 423-428 
Fisher information, 423-424 
for Bernoulli random variable, 424—425 
for an exponential random variable, 425 
lower bound for Bernoulli random variable, 426 
proof of, 426-428 
score function, 423-424 
statement of, 425 
Critical region, 442 
Cross-correlation, 496 
Cross-covariance, 497 
matrix, 321 
Cross-power spectral density, 579 
Cumulative distribution function (cdf), 141-146 
conditional, 152-155 
defined, 141-142 
limiting properties of, 147 
proof of properties of, 146 
three coin tosses, 142 
uniform random variable in the unit interval, 143 
Cyclostationary random processes, 525-529 
pulse amplitude modulation, 526-527 
with random phase shift, 528 


D 


Decision rule, 442 
Decreasing sequence of events, 76 
Delta function, 151-152 
Demodulation of noisy signal, 604—605 
DeMorgan’s Rules, 28 
Deterministic models, 4 
Diagonalization, of covariance matrix, 324-325 
Difference, of sets, 27 
Differential entropy, 206 
of a Gaussian random variable, 207 
of a uniform random variable, 206 
Discrete Fourier transform (DFT): 
calculating distributions using, 392-400 
defined, 394 
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Discrete random variables, 99-104, 146 
calculating distributions using the discrete Fourier 
transform, 393-398 
expected value and moments of, 104-111 
generation of, 127-129 
generation of Poisson random variable, 128 
generation of tosses of a die, 128 
pairs of, 236-241 
pdf for, 151 
probability mass function (pmf), 99-100 
properties of, 115 
uniform, mean of, 105-106 
Discrete sample spaces, 24, 35-37, 79 
Discrete-time birth-and-death process, 689-690 
Discrete-time Markov chains, 650-660 
binomial counting process, 653 
Google PageRank, 657-658 
homogeneous transition probabilities, 651 
n-step transition probabilities, 653-654 
simulation of, 696-698 
state probabilities, 654-658 
steady state probabilities, 658—660 
Discrete-time random process, 495, 582-583 
binomial counting and random walk processes, 
501-507 
cross-power spectral density, 583 
iid random process, 498-500 
independent increments and Markov properties of 
random processes, 500-501 
moving average process, 584 
power spectral density, 583-585 
signal plus noise, 584-585 
white noise, 584 
Discrete-time systems: 
filtered white noise, 594 
response to random signals, 593-597 
transfer function, 593 
discrete_rnd function, 128 
Disjoint sets, 27 
Distribution, convergence in, 387 
Distribution to data, testing the fit of, 462-468 
Distributive properties, 28 


E 


Eigenfunctions, 547 
Eigenvalues, 547 
80/20 rule, and the Lorenz curve, 126-127 
Einstein, Albert, 578fn 
Einstein-Wiener-Khinchin theorem, 578 
Elementary events, 25 
probability of, 35 
Elements, 25 
Embedded Markov chains, 675 
simulation using, 776-779 
Empty set, 26 
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Engset formula, 789 
Entropy, 202-212 
of a binary random variable, 203-204 
defined, 202 
differential, 206 
of a geometric random variable, 205 
maximum 
method of, 211-212 
as a measure of information, 207-210 


of a quantized continuous random variable, 206 


of a random variable, 202-207 
reduction of, through partial information, 204 
relative, 204 
Equally likely outcomes, 35 
Ergodic Markov chain, 668-670 
Ergodic theorem, 540 
Ergodicity: 
and exponential correlation, 543 
of self-similar process and long-range 
dependence, 543-544 
Error control by retransmission, 64 
Error control system, 12 
Error correction coding, 62-63 
Error detection and correction methods, 13 
Estimation: 
Bernoulli random variable, 421 
Cramer-Rao inequality, 423-428 
maximum likelihood, 419-430 
of mean and variance for Gaussian random 
variable, 422—423 
parameter, 415-419 
Poisson random variable, 421-422 
and sample mean, 416-417 
using causal filters, 614-617 
using the entire realization of the observed 
process, 613-614 
Estimation error, 334 
Estimation of random variables, 332-342 
MAP and ML estimators, 332-334 
minimum MSE estimator, 336-338 
minimum MSE linear estimator, 334-335 
using a vector of observations, 338-342 
Estimators: 
bias, 416 
consistent, 418 
for the exponential random variable, 417-418 
finding, 419 
properties of, 416-419 
sample mean, consistency of, 418 
sample variance, consistency of, 418-419 
strongly consistent, 418 
unbiased, 417 
Event classes, 29-30, 70-75 
Lisa and Homer’s urn experiment, 72-73 
Events: 


certain, 24 
elementary, 25 
impossible, 24 
null, 24 
product form, 304 
Expected value(s), 11 
betting game, 106 
discrete random variables, 104-111 
of the indicator function, 159 
of a random variable, 155-163 
of a sinusoid with random phase, 158 
of Y = g(X), 157-159 
Exponential failure law, 190-191 
Exponential random variables, 163-167 
estimators for, 417-418 
example, 150 
Fisher information for, 425 


F 


Failure rate function, 189-192 
Fast Fourier transform (FFT): 
algorithms, 396-397 
and random processes, 628-630 
Filtered noisy signal, 493 
Filtered Poisson impulse train, 512-513 
Filtered white noise: 
continuous-time systems, 590 
discrete-time systems, 594 
Filtering problem, 606 
Filtering techniques, random processes, 628-630 
Finite sample space, 30 
Finite-source queueing systems, 734-738 
arriving customer’s distribution, 737-738 
Web server system, 736-737 
Finite-state continuous-time Markov chains, 694 
stationary pmf for, 693 
Finite-state discrete-time Markov chain, 693-694 
Finite-state Markov chains, 667 
First-order autoregressive (AR) process, 594-595 
Fisher information, 423—424 
for Bernoulli random variable, 424—425 
Fourier series, 544-546 
and Karhunen-Loeve expansion, 544-550 
Fourier transform, 184-185 


G 


Gamma random variables, 164, 170-172 
generating, 199-200, 201 
implementing rejection method for, 200 
Laplace transform of, 189 
pdf of, 170 

Gaussian random processes, 515-518 
continuous-time, 516 


iid discrete-time, 515-516 
moving average process, 524-525 
Gaussian random variables, 164, 167-170 
cdf for, 167 
Chernoff bound for, 187 
and communications systems, 168 
conditional pdf of, 327-328 
confidence intervals: 
summary of, 437 
for the variance of, 436—437 
differential entropy of, 207 
estimation of mean and variance for, 422—423 
joint characteristic function of, 331-332 
jointly, 278-284 
linear transformation of, 328-330 
one-sided test for mean of, 449-450 
as UMP, 451 
pdf for, 167 
sampling distribution for the sample mean 
of, 413-414 
sampling distributions for, 437-441 
testing the variance of, 454—455 
two, testing the means of, 446—447 
two-sided test for mean of: 
known variance, 452-453 
unknown variance, 453-454 
variance of, 160 
Geometric probability law, 63-64 
Geometric random variables, 103, 119-120 
defined, 119 
entropy of, 205 
mean of, 106 
properties of, 115 
variance of, 110-111 
Global balance equations, 680-683 
Google PageRank, 657-658, 692-693 
algorithm, 671 


H 


Homogeneous transition probabilities, 651, 673-674 
Hurst parameter, 544 
Hyperexponential random variable, 202 
Hypothesis testing, 441-455 
alternative hypothesis, 444 
Bayes hypothesis testing, 455-460 
composite hypotheses, testing, 449-455 
composite hypothesis, 444 
confidence intervals and, 455 
critical region, 442 
decision rule, 442 
fair coin, testing, 442-443 
improved battery, testing, 443 
likelihood ratio function, 446 
maximum likelihood test, 448 
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Neyman-Pearson, 446-448 
null hypothesis, 441-442 
p-value of the test statistic, 443 
rejection region, 435, 442, 445-446 
significance level, 442 
significance testing, 441-443 
objective of, 441-442 
simple hypotheses: 
defined, 444 
testing, 444-449 
summary of, 455 
testing the means of two Gaussian random 
variables, 446-447 


I 


Ideal filters, 591-592 
iid Bernoulli random variables, 383, 492-493 
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iid discrete-time Gaussian random processes, 515-516 


iid Gaussian random variables, 493 
iid Gaussian sequence, joint pdf of, 505 
iid interarrivals, arrival rate for, 389 
iid random process, 498-500 
autocorrelation function of, 498 
autocovariance of, 498 
Bernoulli random process, 499 
mean of, 498 
random step process, 499-500 
Impossible event, 24 
Impulse response, 588 
Increasing sequence of events, 76 
Independence of events, 53-59 
Independent events, 79 
examples of, 55 
Independent experiments, 57 
Independent Gaussian random variables: 
generating, 284-286 
radius and angle of, 275-276 
sum of, 362 
Independent, identically distributed (iid) random 
variables, 361 
iid Bernoulli random variables, 383 
pdf of, 365 
relative frequency, 365-366 
sum of, 362-363 
Independent increments, 502-504 
Independent Poisson arrivals, merging of, 310-311 
Independent random processes, 496 
Independent random variables, 254-257 
covariance of, 259 
product of functions of, 258 


Independent replications, simulation through, 772-773 


Indexed family of random variables, See Random 
processes 
Indicator function, 102 


810 Index 


Infinite smoothing, 613-614 

Initial probability assignment, 34, 79 
satisfying axioms of probability, 41 

Innovations, 620 

Interarrival (cycle) times, 387-392 

Internet scale systems, 15-16 

Intersection, 27 

Irreducible class, 663, 667 


J 


Jackson’s theorem, 758-762 
proof of, 760-762 
statement of, 759 
Joint characteristic function, 322-324 
of Gaussian random variables, 331-332 
Joint cumulative distribution function, 243, 305 
Joint distribution functions, 305-309 
vector random variables, 305-309 
joint cumulative distribution function, 305 
joint probability density function, 307 
joint probability mass function, 305-306 
Joint moments, 258 
Joint probability density function, 307 
Joint probability mass function, 236, 305-306 
Jointly Gaussian random variables: 
generating vectors of, 344-345 
linear transformations of, 277-278 
MAP and ML estimators, 333-334 
minimum mean square error, 338 
pairs of, 278-284 
estimation of signal in noise, 282-283 
rotation of, 283-284 
sum of, 330 
Jointly Gaussian random vectors, 325-328 
Jointly stationary processes, 519 
Jointly wide-sense stationary processes, 521 


K 


Kalman filter, 617—622 
algorithm, 621 
Karhunen-Loeve expansion, 325, 546-550, 607fn 
defined, 546 
and Fourier series, 544—550 
of Weiner process, 548-549, 550 
Khinchin, A. Ya., 578fn 
Kirchhoff’s voltage and current laws, 4 
Kronecker delta function, 545, 548 


L 


Langevin equation, 538 

Laplace transform, 188-189 

Laplacian random variables, 165 
example, 150 


Laws of large numbers: 
and sample mean, 365-366 
strong law, 368-369 
weak law, 367 
Likelihood function, 420 
Likelihood ratio function, 446, 457 
Lindley’s recursion, 778-779 
Linear combinations of deterministic functions, 
generating, 553-554 
Linear prediction problem, 610-611 
Linear systems: 
optimum, 605-617 
response to random signals, 587-593 
continuous-time systems, 587-593 
discrete-time systems, 593-597 
Linear transformations: 
of Gaussian random variables, 330 
of jointly Gaussian random variables, 277-278 
pdf of, 276-278 
of random vectors, 320-322 
Little’s formula, 715-718 
mean number in queue, 718 
server utilization, 718 
Long-term arrival rates, 387-392 
Long-term averages, 359-410 
time, 390-392 
Long-term proportion of “up”time, 390-391 
Lorenz curve, 126-127 


M 


m-Erlang random variables, 170-172, 202 
M/G/1 analysis, embedded Markov 
chains, 745-750 
M/G/1 queueing systems, 738-745 
delay and waiting time distribution in, 752-754 
mean delay, 740-741 
with priority service discipline, 742-745 
for type k customers, 743 
mean waiting time, 741 
for type 1 customers, 742 
for type 2 customers, 743 
mean waiting time for type 1 customers, 742 
mean waiting time for type 2 customers, 743 
number of customers in, 747-750 
Pollaczek—Khinchin mean value formula, 741 
Pollaczek—Khinchin transform equation, 750, 754 
residual service time, 739-740 
M/H,/1 queueing system, 750-751, 753 
M/M/œ queueing system, 733-734 
transition rate diagram for, 733 
M/M/1 queue, 718-727 
arriving customer’s distribution, 723-724 
carried load, 726 
delay distribution, 723-724 
distribution of number in the system, 719-722 


interarrival times, 718 
offered load, 726 
system with finite capacity, 724-727 
traffic intensity, 726 
M/M/1 simulation, regenerative method for, 781 
M/M/c queueing system, 727-732 
distribution of number in, 727-731 
waiting time distribution for, 731-732 
M/M/c/c queueing system, 732-733 
Erlang B formula, 733 
transition rate diagram for, 732 
MAP estimator, 332-334 
compared to ML estimator, 333 
for X given the observation Y, 333 
Marginal cdf’s, 305 
Marginal cumulative distribution functions, 243 
Marginal pdf's, 307 
Marginal pmf’s, 306 
Marginal probability mass functions, 241-242 
Markov chains, 79, 647-712 
age of a device, 670 
binomial counting process, 661 
cartridge inventory (example), 693-694 
classes of states, 660-662 
continuous-time, 674—686 
defined, 66, 650 
discrete-time, 650—660, 675 
n-step transition probabilities, 653-654 
state probabilities, 654-658 
steady state probabilities, 658-660 
embedded, 675 
finite-state, 667 
Google PageRank algorithm, 671 
irreducible class, 661, 665 
limiting probabilities, 667-673 
with multiple irreducible classes, 672-673 
numerical techniques for, 692-700 
random walk, 660 
recurrence properties, 660-665 
simulation of, 695-700 
continuous-time Markov chains, 698-700 
discrete-time Markov chains, 696-698 
states of, 661 
stationary probabilities of, 692-693 
structures for, 666 
time-dependent probabilities of, 693-694 
time-reversed, 686-692 
trellis diagram for, 65 
two-state, for speech activity, 651-653 
Markov inequality, 181, 183 
Markov processes, 647-648 
defined, 647 
moving average, 648-649 
Poisson process as, 649 
random telegraph signal, 649 
state of, 648 
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sum processes, 648 
Wiener process as, 650 
Markov property, 648 
Mathematical models: 
defined, 2 
predictions of, 3 
and system design/modification decisions, 2 
as tools in analysis/design, 2—4 
Matlab®, 67, 70, 129, 200, 285, 393 
Maximum a posteriori (MAP) 
estimator, 332-334 
Maximum likelihood estimation, 419-430 
defined, 419 
likelihood function, 420 
log likelihood function, 420 
maximum likelihood method, 420 
Poisson distributed typos (example), 419 
Maximum likelihood (ML) estimators, 333-334 
asymptotic properties of, 428-430 
Mean: 
of random variables, 155-163 
discrete, 104-111 
exponential, 156-157 
Gaussian, 156 
uniform, 156 
of shot noise process, 514 
Mean ergodic: 
defined, 542 
Mean function, 494 
Mean recurrence time, 668 
Mean square continuity, 529-532 
Mean square convergence, 384 
Mean square derivatives, 532-535 
Mean square error (MSE), 338 
Mean square estimation error, 417 
Mean square integrals, 535-537 
Mean square periodic, 523 
Mean state occupancy time, 677 
Mean time to failure (MTTF), 190 
Mean value analysis, 766-769 
arrival theorem, 767-770 
proof of, 769-770 
Mean vector, 318-319 
Memoryless property, 166-167 
Mersenne Twister, 67 
Message transmissions, 102-103 
Minimum mean square error (MMSE) linear 
estimator, 334 
Minimum MSE estimator, 336-338 
Minimum MSE linear estimator, 334-335 
compared to linear MSE estimator, 336-337 
Mixed type, random variables of, 147 
ML estimator, 333-334 
compared to MAP estimator, 333 
for X given the observation Y, 333 
Modeling process, 3 
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Models: 
defined, 2 
usefulness of, 2-3 
Modulator, 601 
Moment theorem, 185-186 
Moving average process, 507, 595 
Multinomial probability law, 63 
Multiple realizations, 771 
Multiple server systems, 727 
M/M/œ queueing system, 733-734 
M/M/c/c queueing system, 732-733 
M/Mlc queueing system, 727-732 
Mutually exclusive sets, 27 


N 


n factorial, 44 
Negative recurrent state, 668 
Neyman-Pearson hypothesis testing, 446-448 
Nonindependent events, 79 
examples of, 55 
Nonindependent Gaussian random variables, sum of, 272 
Normal random variables, 411 
Null event, 24 
Numerical techniques: 
for Markov chains, 692-700 
for processing random signals, 628-633 
fast Fourier transform (FFT) methods, 628-630 
filtering techniques, 630-631 
Nyquist sampling rate, 597-598, 600 


(0) 


Octave, 67, 70, 129, 200, 285, 393 
Ohm’s law, 4 
One-dimensional random walk, 502-504 

autocovariance of, 505 

independent and stationary increments of, 504 
Optimum filter, defined, 606 
Optimum linear systems, 605-617 

estimation: 

using causal filters, 614-617 
using the entire realization of the observed 
process, 613-614 

orthogonality condition, 606-610 

prediction, 610-612 
Optimum minimum mean square estimator, 338 

diversity receiver, 340-341 

second-order prediction of speech, 341-342 
Ornstein-Uhlenbeck process, 538-539, 591 
Orthogonal random processes, 496 
Orthogonal random variables, 258 
Orthogonality condition, 335, 339, 606-610 
Outcome, experiments, 4-5 

defined, 22 


P 


Packet voice transmission system, 9-11, 391 


Parameter estimation, 415-419 
Pareto distribution, 173 
Pareto random variables, 165, 173-174 
mean and variance of, 174 
Partition, 73 
Periodic state, 665 
Periodogram estimate, 585-587 
defined, 578 
smoothing of, 626-628 
variance of, 623-626 
Point estimator, 415 
Points, 25 
Poisson distributed types, 415—416 
Poisson process, 531 
defined, 508 
as Markov processes, 651 
Poisson random variables, 120-124 
arrivals at a packet multiplexer, 122 
defined, 120 
errors in optical transmission, 123 
estimation of p for, 421-422 
mean/variance of, 122 
pmf for, 120, 123 


for a probability generating function, 188 


properties of, 116 
queries at a call center, 122 
poisson_rnd function, 129 


Pollaczek—Khinchin mean value formula, 741 


Pollaczek—Khinchin transform 
equation, 750, 754 

Population, defined, 412 

Positive recurrent state, 668 

Power set of S, 30 

Power spectral density, 577-587 


continuous-time random processes, 578-583 


cross-power spectral density, 579 
defined, 578 


discrete-time random processes, 583-585 


estimating, 622-628 
periodogram estimate: 
smoothing of, 626-628 
variance of, 623-626 
as time average, 585-587 
Prediction problem, 606 


for long-range and short-range dependent 


processes, 611-612 
Probability: 
a posteriori, 52 


axiomatic approach to a theory of, 8, 411 


axioms of, 21, 30-41, 79 
continuous sample spaces, 37-41 
discrete sample spaces, 35-37 


convergence in, 384-385 
of an outcome, 5 
of sequences of events, 75-78 
using counting methods to compute, 41-47 
Probability density function of X (pdf), 148-155 
conditional, 152-155 
defined, 148 
of discrete random variables, 150-151 
of exponential random variables, 150 
of Laplacian random variables, 150 
of uniform random variables, 149-150 
Probability generating function, 187-189 
for a Poisson random variable, 188 
Probability law, 79 
for a random experiment, 30-31 
Probability mass function (pmf), discrete random 
variables, 99-100 
Probability models, 1-20, 4, 79 
building, 8-9 
defined, 1 
Probability theory, 13-14, 411 
basic concepts of, 21-79 
Product form, 236 
Product-form events, 304 
Pseudo-random number generators, 67-69, 79 
Pulse amplitude modulation, 526-527, 531-532 
with random phase shift, 528 


Q 


Quadrature amplitude modulation (QAM) 
method, 603-604 
Quality control, 52-53 


Quantized continuous random variables, entropy of, 206 


Queue discipline, 715 
Queueing theory, 713-796 

arrival theorem, 766-770 
proof of, 769-770 

Burke’s theorem, 754-758 

closed networks of queues, 763-766 
theorem, 763-764 

finite-source queueing systems, 734-738 
arriving customer’s distribution, 737-738 
Web server system, 736-737 

Jackson’s theorem, 758-762 
proof of, 760-762 
statement of, 759 

Little’s formula, 715-718 
mean number in queue, 718 
server utilization, 718 

M/G/1 analysis, embedded Markov chains, 745-750 

M/G/1 queueing systems, 738-745 
delay and waiting time distribution in, 752-754 
mean delay for type k customers, 743 
mean delay in, 740-741 


R 
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mean delay with priority service discipline, 743-745 
mean waiting time, 741 
mean waiting time for type 1 customers, 742 
mean waiting time for type 2 customers, 743 
number of customers in, 747-750 
Pollaczek—Khinchin mean value formula, 741 
Pollaczek—-Khinchin transform equation, 750, 754 
residual service time, 739-740 

M/H,/1 queueing system, 750-751, 753 

M/M/1 queue, 718-727 
arriving customer’s distribution, 723-724 
delay distribution, 723-724 
distribution of number in the system, 719-722 
interarrival times, 718 
system with finite capacity, 724-727 

mean value analysis, 766-769 

multiple server systems, 727 
M/M/œ queueing system, 733-734 
M/Mic/c queueing system, 732-733 
M/M/c queueing system, 727-732 

open queueing networks, 758-760, 763 

queueing system: 
elements of, 714-715 
models, 715 
number of customers in, 716-717 
simulation and data analysis of, 771-782 


Random amplitude, sinusoid with, 495 
Random experiments, 4 


events, 24-25 

probability law for, 30-31 
sample space, 22-24 
sequential, 21 

simulation of, 70 
specifying, 21-30 


Random input, response of a linear system to, 537-539 
Random number generators, 67-70, 101 


generation of numbers from the unit interval, 68-69 
pseudo-, 67-69 
simulation of random experiments, 70 


Random phase, sinusoid with, 495-496 
Random processes, 487-576 


continuity, 529-532 

defined, 488-491 

derivatives, 532-535 

discrete-time processes, 498-507 

filtered Poisson impulse train, 512-513 

Gaussian, 515-518 

generation of, 550-554, 631-633 

independent increments and Markov properties 
of, 500-501 

integrals, 535-537 

mean of shot noise process, 514 
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mean square continuity, 529-532 

mean square derivatives, 532-535 

mean square integrals, 535-537 

multiple, 496-497 

Poisson process, 507-511 

random binary sequence, 489 

random sinusoids, 489 

random telegraph signal (process), 511-512 
specifying, 491-497 

stationary, 518-528 

time averages of, 540-544 

time samples, joint distributions of, 492-493 


Random sample, 412, 415 
Random signals: 


amplitude modulation by, 601-605 

analysis/processing of, 577-646 

bandlimited random processes, 597—605 
amplitude modulation by random signals, 601-605 
sampling of, 597-601 

discrete-time systems, 593-597 

Kalman filter, 617—622 
algorithm, 621 

numerical techniques for processing, 628-633 
fast Fourier transform (FFT) methods, 628-630 
filtering techniques, 630-631 

optimum linear systems, 605-617 
estimation using causal filters, 614-617 
estimation using the entire realization of the 

observed process, 613-614 

orthogonality condition, 606-610 
prediction, 610-612 

power spectral density, 577-587 
continuous-time random processes, 578-583 
defined, 578 
discrete-time random processes, 583-585 
estimating, 622-628 
as time average, 585-587 

response of linear systems to, 587-593 


Random telegraph signal (process), 511-512 
Random variables: 


Bernoulli, 102 
coin toss, 117 
estimation, 421, 428 
Fisher information for, 424-425 
mean of, 105 
properties of, 115 
variance of, 110 
beta, 165, 172-173 
generating, 198 
betting games, 101 
binomial, 103 
Chernoff bound for, 375-377 
coin toss, 118 
coin tosses and, 101 
defined, 117-118 


mean of, 118-119 
negative, properties of, 116 
properties of, 115 
redundant systems, 119 
sampling distribution of, 414-415 
three coin tosses and, 105 
variance of, 119 
Cauchy, 165, 173 
computer methods for generating, 194-202 
rejection method, 196-201 
transformation method, 195-196 
continuous, 146-149, 163-174 
beta, 165, 172-173 
calculating distributions using the discrete Fourier 
transform, 398-400 
Cauchy, 165, 173 
exponential, 163-167 
gamma, 164, 170-172 
Gaussian, 164, 167-170 
Laplacian, 165 
Pareto, 165, 173-174 
Rayleigh, 165 
two, joint pdf of, 248-254 
uniform, 163 
convergence of sequences of, 378-387 
correlated vector random variables, generating, 
342-345 
cumulative distribution function (cdf), 141-146 
defined, 96 
with differences in type, 247-248 
communication channel with discrete input and 
continuous output, 247-248 
discrete, 99-104, 146 
calculating distributions using the discrete Fourier 
transform, 393-398 
expected value and moments of, 104-111 
generation of, 127-129 
pairs of, 236-241 
pdf for, 151 
probability mass function (pmf), 99-100 
properties of, 115 
uniform, mean of, 105-106 
discrete random variables, pairs of, 236-241 
estimation of, 332-342 
expected value, 155-163 
of functions of, 107-109 
exponential, 163-167 
estimators for, 417-418 
example, 150 
Fisher information for, 425 
formal definition of, 99, 141 
functions of, 174-181 
gamma, 164, 170-172, 425 
generating, 199-200, 201 
implementing rejection method for, 200 


Laplace transform of, 189 
pdf of, 170 
Gaussian, 167-170 
cdf for, 167 
Chernoff bound for, 187 
and communications systems, 168 
conditional pdf of, 327-328 
confidence intervals, 436—437 
differential entropy of, 207 
estimation of mean and variance for, 422—423 
joint characteristic function of, 331-332 
jointly, 278-284 
linear transformation of, 328-330 
one-sided test for mean of, 449-450, 451 
pdf for, 167 
sampling distribution for the sample mean 
of, 413-414 
sampling distributions for, 437-441 
testing the variance of, 454-455 
two-sided test for mean of, 452-454 
two, testing the means of, 446-447 
variance of, 160 
Gaussian random variables, 164 
generation of functions of, 201-202 
generation of mixtures of, 202 
m-Erlang random variable, 202 
geometric, 103, 119-120 
defined, 119 
entropy of, 205 
mean of, 106 
properties of, 115 
variance of, 110-111 
hyperexponential, 202 
iid Bernoulli, 383, 492—493 
iid discrete-time Gaussian, 515-516 
iid Gaussian, 493 
independent, 254-257 
covariance of, 259 
product of functions of, 258 
independent, identically distributed (iid), 361 
joint cdf of x and y, 242-247 
jointly Gaussian: 
generating vectors of, 344-345 
linear transformations of, 277-278 
MAP and ML estimators, 333-334 
minimum mean square error, 338 
pairs of, 278-284 
rotation of, 283-284 
sum of, 330 
Laplacian, 165 
example, 150 
m-Erlang, 170-172, 202 
marginal probability mass functions, 241-242 
maximum/minimum of, 310 
mean of, 155-163 
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of mixed type, 147 
notion of, 96-99 
nth moment of, 161 
orthogonal, 258 
pairs of, 233-302 
Pareto, 165, 173-174 
mean and variance of, 174 
Poisson, 120-124 
arrivals at a packet multiplexer, 122 
defined, 120 
errors in optical transmission, 123 
estimation of p for, 421—422 
mean/variance of, 122 
pmf for, 120, 123 
for a probability generating function, 188 
properties of, 116 
queries at a call center (example), 122 
square-law device, 107-108 
St. Petersburg paradox, 107 
standard deviation of, 109 
sums of, 257-258, 359-410 
mean and variance of, 360-361 
pdf of, 361-363 
random number of variables, 364—365 
transformations of, 274-275 
two, 233-236 
expected value of a function of, 257-258 
functions of, 271-278 
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joint moments and expected values of a function 


of, 257-261 
sum of, 271-272 
types of, 146-147 
uncorrelated, 260-261 
uniform, 101, 124-125, 163-164 
differential entropy of, 206 
example, 149-150 
properties of, 116 
in unit interval, 124-125, 143 
variance of, 160 
variance of, 109-111, 160-163 
Gaussian, 160 
three coin tosses, 110 
uniform, 160 
voice packet multiplexer, 108-109 
Zipf, 125-127 
80/20 rule and the Lorenz curve, 126-127 
properties of, 116 
rare events and long tails, 126 


Random vectors: 


linear transformations of, 320-322 
transformations of, 311-312 


Random walk: 


autocovariance of, 506 
independent and stationary increments of, 504 
Markov chains, 664 
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Rayleigh continuous random variables, 165 
Realization, 488 
Recurrence properties, 662-667 
Recurrent state, 667—669 
random walk, 668 
Redundant systems, reliability of, 311 
Regression curve, 336 
Rejection method, 196-201 
implementing for gamma random variables, 200 
Rejection region, 442, 445-446 
Relative complement, of sets, 27 
Relative entropy, 204 
Relative frequency, 5—6, 365-366, 412 
properties of, 7 
Reliability, 13 
defined, 189 
of redundant systems, 311 
Reliability calculations, 189-194 
exponential failure law, 190-191 
failure rate function, 189-192 
mean time to failure (MTTF), 190 
system reliability, 192-194 
Weibull failure law, 192 
Renewal counting process, 387 
Repair cycles, 389 
Replication through regenerative cycles, 780-782 
Residual lifetime, 391-392 
Residual service time, 739-740 
Resource-sharing systems, 14-15 


S 


Sample function, 488 
Sample mean, 10, 365-366, 412 
and estimation, 416—417 
mean and variance of, 413 
Sample mean estimators, consistency of, 418 
Sample path, 488 
Sample point, 22 
Sample space, 4, 22, 79 
continuous, 24, 37-41, 79 
discrete, 24, 35-37, 79 
Sample variance, 416—417, 437 
Sample variance estimators, consistency of, 418-419 
Sampling: 
permutations of n distinct objects, 43-47 
sampling with replacement/with ordering, 47 
sampling without replacement/without 
ordering, 44—46 
using counting methods to compute: 
sampling with replacement/with ordering, 42 
sampling without replacement/without 
ordering, 42-43 
Sampling distribution: 
of binomial random variable, 414—415 


defined, 412 
for Gaussian random variables, 437-441 
for the sample mean: 
large n, 414 
of Gaussian random variables, 413-414 
Scattergram, 259 
Scattergram plot, 236 
Second moment of X, 109 
Sequence of random variables, 378-387 
Sequences of events, probability of, 75-78 
Sequential experiments, 59-66 
binomial probability law, 60-62 
geometric probability law, 63—64 
independent experiments, sequences of, 59 
multinomial probability law, 63 
sequences of dependent experiments, 64-66 
Sequential random experiments, 21 
Service discipline, 717 
Service time, 716 
Set operations/set relations, 26 
Set theory, 21 
review of, 25-29 
Shot noise process, 501 
Signal plus noise, 497 
autoregressive, filtering of, 609-610 
filtering of, 609 
Signal-to-noise ratio (SNR), defined, 283 
Significance level, 442 
Significance testing, 441-443 
objective of, 441-442 
Simple hypotheses: 
defined, 444 
radar detection problem, 444—445 
testing, 444-449 


Type I and Type II error probabilities, using sample 


size to select, 445 
Simulation: 
of queueing systems, 771-782 
approaches to, 771-772 
regenerative method for, 780 
replication through regenerative cycles, 780-782 
through independent replications, 772-773 
time-sampled process, 773-776 
using embedded Markov chains, 776-779 
Simulation based Markov chains, 776-779 
Single realization, 771,774 
Smoothing, 606 
infinite, 613-614 
of periodogram estimate, 626-628 
Spectral factorization, defined, 615fn 
Square-law device, 107-108 
St. Petersburg paradox, 107 
Stable system, 594fn 
Standard deviation, of a random variable, 109, 160 
Standby redundancy, 272-273 


State, of Markov processes, 648 
State occupancy times, 675 
State probabilities, 654-658 
State transition diagram, two-state process 
with, 659-660 
Stationary probabilities, of Markov chains, 692-693 
Stationary random processes, 518-528 
cyclostationary, 525-529 
iid random process, 519-520 
jointly stationary processes, 519 
random telegraph signal, 520-521 
stationarity and transience, 518-519 
wide-sense, 521-524 
Gaussian random processes, 524-525 
Stationary state pmf, of Markov chains, 659, 680 
Statistical inferences, 412 
Statistical regularity, 5-6 
Statistics, 411-486 
defined, 411 
origin of, 411-412 
samples, 411-415 
sampling distributions, 411-415 
Steady state probabilities, 658-660 
Stirling’s formula, 44 
Stochastic matrix, 694 
Stochastic processes, See Random processes 
Strong law of large numbers, 368-369 
Strongly consistent estimators, 418 
Subset, 25, 79 
Sum processes, 501-507, 648 
binomial counting process, 501-502 
defined, 501 
one-dimensional random walk, 502-504 
Sum random processes, generating, 550-553 
Sure convergence, 381 
System reliability, 58-59 
System saturation point, 737 


T 


Theorem on total probability, 50 
Time averages, of random processes, 540-544 
Time-dependent probabilities of Markov chains, 
693-694 
cartridge inventory, 695 
Time-invariant systems, 588 
Time-reversed Markov chains, 686-692 
continuous-time Markov chains, 690-691 
discrete-time birth-and-death process, 689-690 
Time-reversed process, 687 
Time-sampled process simulation, 773-776 
method of batch means, 775 
transient of M/M/1 queue using, 773 
Time samples, joint distributions of, 492-493 
Total delay, 715 
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Total probability, theorem on, 50 
Transfer function: 
continuous-time systems, 588 
discrete-time systems, 593 
Transform methods, 184-189 
characteristic function, 184-187 
Laplace transform, 188-189 
probability generating function, 187-189 
Transformation method, 195-196 
Transformations: 
pdf of, 312-317 
of uncorrelated random vector, 321 
to uncorrelated random vector, 321-322 
Transient state, 663 
binomial counting process, 664 
random walk, 664 
Transition pdf, 501 
Transition pmf, 501 
Translated unit step function, 151 
Transmission errors, 103 
Tree diagram, 49 


U 


Unbiased estimators, 366, 417 
Uncorrelated jointly Gaussian random variables, 
independence of, 327 
Uncorrelated random processes, 497 
Uncorrelated random variables, 260-261 
Uncorrelated random vector: 
transformation of, 321 
transformation to, 321-322 
Uniform random variables, 101, 124-125, 163-164 
differential entropy of, 206 
example, 149-150 
properties of, 116 
in unit interval, 124-125 
in the unit interval, 143 
variance of, 160 
Uniformly most powerful (UMP) test, 451 
Union, 27 
Unit-sample response, 593 
Unit step function, 151 
Universal set, 25 


y 


Variance: 
analog-to-digital conversion, 161-163 
of random variables, 109-111, 160-163 
Gaussian, 160 
three coin tosses, 110 
uniform, 160 
Variance function, 494 
Vector random variables, 303-358 
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arrivals at a packet switch, 304, 306-307 

audio signal samples, 304 

covariance matrix, 319 

defined, 303 

events, 304-305 

expected values of, 318-325 

functions of, 309-317 

independence, 309 

joint distribution functions, 305-309 
joint cumulative distribution function, 305 
joint probability density function, 307 
joint probability mass function, 305-306 

joint Poisson counts, 304 

jointly continuous random variables, 307 

mean vector, 318-319 

multiplicative sequences, 308-309 

probabilities, 304-305 

Voice packet multiplexer, 108-109 


W 


Waiting time, 715 
Weak law of large numbers, 367 
Web server systems, 14-15 
configuration of, 15 
simple model for, 14 
Weibull failure law, 192 
White Gaussian noise: 
defined, 535 
generation of, 632—633 


integral of, 537 

and Wiener random process, 534-535 
White Gaussian noise process, 550 
Wide-sense stationary Gaussian random 

processes, 524—525 

Wide-sense stationary random processes, 521-524 
Wiener filter, 616-617 
Wiener-Hopf equations, 614 
Wiener-Khinchin theorem, 578fn 
Wiener, Norbert, 578fn 
Wiener process, 516-517, 531 

as Markov processes, 652 

sample functions of, 517 
Wiener random process, 517 

and white Gaussian noise, 534-535 
WSS random process: 

sampled, digital filtering of, 600 

sampling, 599 


Y 
Yule-Walker equations, 611 


Z 


Zipf, George, 125 

Zipf random variables, 125-127 
80/20 rule and the Lorenz curve, 126-127 
properties of, 116 
rare events and long tails, 126 


